SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Downloaden Sie, um offline zu lesen
International Journal of Computer and Technology (IJCET), ISSN 0976 – 6367(Print),
International Journal of Computer Engineering
                                              Engineering
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME
and Technology (IJCET), ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online) Volume 1                                      IJCET
Number 2, Sept - Oct (2010), pp. 38-46                             ©IAEME
© IAEME, http://www.iaeme.com/ijcet.html


     EFFICIENT TEXT COMPRESSION USING SPECIAL
  CHARACTER REPLACEMENT AND SPACE REMOVAL
                                 Debashis Chakraborty
                     Department of Computer Science & Engineering
                    St. Thomas’ College of Engineering. & Technology
                                 Kolkata-23, West Bengal
                            E-Mail: sunnydeba@gmail.com

                                    Sutirtha Ghosh
                          Department of Information Technology
                    St. Thomas’ College of Engineering. & Technology
                                 Kolkata-23, West Bengal
                             E-Mail: sutirtha84@yahoo.co.in

                                  Joydeep Mukherjee
                          Department of Information Technology
                    St. Thomas’ College of Engineering. & Technology
                                 Kolkata-23, West Bengal

ABSTRACT
       In    this   paper,   we      have   proposed     a   new     concept    of    text
compression/decompression algorithm using special character replacement technique.
Moreover after the initial compression after replacement of special characters, we remove
the spaces between the words in the intermediary compressed file in specific situations to
get the final compressed text file. Experimental results show that the proposed algorithm
is very simple in implementation, fast in encoding time and high in compression ratio and
even gives better compression than existing algorithms like LZW, WINZIP 10.0 and
WINRAR 3.93.
Keywords: Lossless compresssion; Lossy compression; Non-printable ASCII value;
Special character, Index, Symbols.




                                            38
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME


INTRODUCTION
        As evident from the name itself data compression is concerned with the
compression of a given set of data [5,6,8]. The primary reason behind doing so is to
reduce the storage space required to save the data, or the bandwidth required to transmit
it. Although storage technology has developed significantly over the past decade, the
same cannot be said for transmission capacity. As a result the concept of compressing
data becomes very important. Data compression or source coding is the process of
encoding information using fewer bits (or other information bearing units) than an
unencoded representation would use through specific use of encoding schemes. It follows
that the receiver must be aware of the encoding scheme in order to decode the data to its
original form. The compression schemes that are designed are basically trade- offs among
the degree of data compression, the amount of distortion introduced and the resources
(software and hardware) required to compress and decompress data [5,9]. Data
compression schemes may broadly be classified into – 1.Lossless compression and
2.Lossy compression. Lossless compression algorithms usually exploit statistical
redundancy in such a way as to represent the sender’s data more concisely without error.
Lossless compression is possible because most real world data has statistical redundancy.
Another kind of compression, called lossy data compression is possible if some loss of
fidelity is acceptable. It is important to consider that in case of lossy compression, the
original data cannot be reconstructed from the compressed data due to rounding off or
removal of some parts of data as a result of redundancies. These types of compression are
also widely used in Image compression [10, 11, 12, 13]. The theoretical background of
compression is provided by information theory and by rate distortion theory. There is a
close connection between machine learning and compression: a system that predicts the
posterior probabilities of a sequence given its entire history can be used for optimal data
compression (by using arithmetic coding on the output distribution), while an optimal
compressor can be used for prediction (by finding the symbol that compresses best, given
the previous history). This equivalence has been used as justification for data
compression as a benchmark for "general intelligence". We hereby focus on the
compression of text. Various algorithms have been proposed for text compression



                                                 39
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME


[1,2,3,4,7]. We proposed an efficient text compression algorithm that should yield better
compression than existing algorithm like Lempel-Zip-Welch and existing software like
Winzip10.0 and Winrar3.93, while ensuring the compression is a lossless compression
process. Our proposed algorithm is based on a systematic special character replacement
technique.
        The rest of this paper is organized as follows:- Section II, the concepts of special
character replacement is provided. Section III describes the creation and maintenance of
dynamic dictionary. Section IV describes the process of removal of spaces between
special symbols in the intermediary compressed file. Section V gives the proposed
algorithm and section VI described the experimental results and Section VII concludes
the paper.
1. SPECIAL CHARACTER REPLACEMENT
        In the proposed algorithm we replaced every word in a text with an ASCII
character. In the extended ASCII set of characters there are two hundred and fifty-four
(254) characters. Among these some of them hold NULL, Space, Linefeed or English
alphabets as their special symbols. Neglecting them there are one hundred and eighty-
four(184) ASCII characters that have been used in this proposed algorithm. In this
proposed algorithm, one letter or two letter English words in the text file are not replaced
with an ASCII character. A non-printable ASCII character replaces words having more
than two letters. For an example, the word ‘of’ remains the same, whereas a non-printable
ASCII character ‘1’ replaces the word ‘name’. Whenever a new word is found we
maintained an index (integer) and the corresponding special ASCII character is replaced
for the word in the compressed text file. When the word is repeated in the file, it is
replaced by the same ASCII value assigned to it previously.
        In this algorithm, we used one hundred and eighty-four symbols for the first one
hundred and eighty-four words. Once the number of words exceeds the above value, we
combined the ASCII characters to generate new symbols for the new words in the text
file. When a space is encountered between two words, it is replaced with an integer ‘0’.
To determine the end of a statement symbol ‘9’ is used, so the termination of a sentence




                                                 40
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME


can be identified during the process of decompression of the text file from the
compressed file. For an example, suppose there is a line of text:
      My name is Debashis Chakraborty.
        Assuming this is the first sentence in the file and following the proposed
algorithm the words ‘My’ and ‘is’, are kept unchanged in the compressed file. ‘Name’
being the first word to be compressed we assigned the index ‘1’ to it and replace the word
with the ASCII character of ‘1’. Similar process is repeated for the other words whose
length is greater than two. We also replaced the space between words with ‘0’ and the ‘.’
with ‘9’. Therefore the corresponding compressed sentence for the above example is:
                 My0$0is#0&9
        where $, #,& are non printable ASCII characters for integers ‘1’ for ‘name’, ‘2’
for ‘Debashis’ and ‘3’ for ‘Chakraborty’ respectively, each occupying one byte of
memory space in the memory. The original line of text had 32 bytes, whereas the
compressed line has 12 bytes. Thus the above proposed method enables us to obtain
comprehensive compression of text, resulting in better transmission bandwidth
management and requires less storage.
2. CREATION OF DYNAMIC DICTIONARY
        Text Compression Algorithms should always be lossless compression algorithms,
preventing any loss of information. The text file regenerated from the compressed file
must be identical to the original file. All text compression algorithms maintain a
dictionary containing the words that appear in the text file. The text file is regenerated
from the original file with the help of this dictionary. Dictionaries maintained can either
be static or dynamic.
        In this proposed algorithm, dynamic dictionary is used. We maintained a table
containing the fields, named ‘Index’, Symbol’ and ‘Word’ to form the dictionary.
Initially the table is empty. When a word to be compressed in the text file is encountered,
check whether the word exists in the table. Every time a new word is found in the text
file, assign an integer value to it and tabulate its special symbol using single non-
printable ASCII characters or combination of such character symbols. It stored the
assigned integer under index field, the special symbol under symbol field and the



                                                 41
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME


corresponding word under the word field. Every time a new word is found in the text the
dictionary is updated using the same procedure. When a word is repeated, the already
assigned symbol for the word is used.
        During the process of decompression, the special symbol in the compressed file is
searched to obtain its corresponding integer value or index and the corresponding word.
Finally the symbols are replaced with their corresponding words to regenerate the original
file.
3.      REMOVAL            OF       SPACES            FROM        THE        INTRMEDIARY
COMPRESSED FILE
        Every file contains spaces between the words to identify different words. The
words are separated from each other with spaces. Here we propose a method to remove
these spaces without loosing any information to obtain better compression.
        The usage of special symbols for every word in the original file compresses the
size of the file and an intermediary compressed file is obtained. We do not remove the
spaces from the intermediary file. Instead it contains ‘0’ to specify the location of spaces
between words. Every word in the original file is replaced by either one special symbol or
a combination of two special symbols.
        In the intermediary compressed file when we obtain ‘0’ after one special symbol,
the contents are not modified. Whereas when a word is replaced by a combination of two
symbols or the word is a one or two lettered word( no replacement in the intermediary
compressed file), we remove the ‘0” after them i.e. the space between the present word
and next word is removed. For example, suppose the there is a line of text:
                 My name is Debashis Chakraborty.
        Assuming the special symbol for ‘name’ is ‘$’, ‘Debashis’ is ‘##’ and
‘Chakraborty’ is ‘@’, then after the final compression the output of the above sentence is:
                                  My$0is##@9
4. PROPOSED ALGORITHM
        We proposed an algorithm that takes a text file as input. The proposed algorithm
can compress text files to comparable size of Lempel-Ziv Welch Algorithm, Winzip10.0
and Winrar3.93. The proposed algorithm is:



                                                 42
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME


Algorithm for Compression
Step 1: Read the contents of Text file on word at a time.
Step 2: Create a dictionary containing the fields ‘Index’, ‘Symbol’ and ‘Word’. The
        dictionary is initially empty.
Step 3: Length Calculation
        •     Calculate the length of the word read from the text file.
        •     Write the original word into the intermediary file if the length of the word is
              less than or equal to two.
        •     If the read word is single character and represents “.” replace with ‘9’ in the
              compressed file and if the read character is a space between two words replace
              with ‘0’ in the compressed file.
        •     For word of length greater than two it is replaced with special symbol (non-
              printable ASCII character or combination of ASCII characters).
Step 4: Special Character replacement
        •     Check whether the word exists in the dictionary.
        •     For a new word assign an integer which acts as the index and a special symbol
              for the corresponding word.
        •     For the index of the word being less than one hundred and eighty-four, assign
              index’s respective single character ASCII symbol as their special symbol.
        •     If a word has an index more than one hundred and eighty-three, combine
              ASCII characters to form the new symbol.
        •     Update the dictionary by inserting the new word along with its index value and
              assigned symbol, which can be used for future reference.
        •     For repetition of an existing word replace the pre-assigned symbol for the word
              as obtained from the dictionary.
Step 5: Continue the above process of compression and updation of the dynamic
            dictionary till the end of the original file is reached.
Step 6: Removal of Spaces from the intermediary file
        •     Read the contents of the intermediary file, one special symbol at a time.




                                                 43
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME


        •   Check whether the word in the original file is replaced by one special character
            as symbol or a combination of two.
        •   If there is ‘0’ after symbol containing one special character as a replacement of
            a word, retain the zero.
        •   If there is a combination of special characters to represent a word or the word
            itself (one or two letter words) remove the ‘0’ (representing space between
            words) to obtain the final compressed file.
        Step 7: Continue the above process of compression till the end of the intermediary
        file is reached.
Algorithm for Decompression
Step 1: Read the symbol form the compressed file.
Step 2: If the read symbol is ‘0’ replace with a space or tab. For the symbol ‘9’ replace
        with a “.” to indicate end of sentence.
Step 3: Decoding of Special Characters
        •   If the read symbol from compressed file is an English alphabet, write the same
            into the decompressed file. Write ‘space’ or tab after the word in the
            decompressed file.
        •   For a special symbol, find a match for it in the dictionary and write the
            corresponding word in the decompressed file.
        •   If the special symbols are a combination of two special characters, write space
            or tab after the corresponding word in the decompressed file.
Step 4: Continue the above process till the end of the compressed file is reached.
5. EXPERIMENTAL RESULTS
        The algorithm developed has been simulated using TURBO C. The input text files
are considered to be .txt,. rtf, .cpp and .c files. All the text files that we have tested are of
different sizes. The compression ratios obtained are tabulated in Table 1. The
compression ratio is better than that of Lempel-Ziv Welch Algorithm,Winzip10.0 and
Winrar3.93 for majority of the text files. All the text files reconstructed from the
compressed file are of the same size as that of original file. Therefore the proposed
algorithm follows lossless compression.


                                                 44
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME



 Original       Original
                       Compression Compression Compression Compression
   File         File Size   by               by                 by         by Proposed
                          LZW            WINRAR             WINZIP          Algorithm
                                            3.93               10.0
sgjm1.txt  5046bytes 3292 bytes        2056bytes         2468 bytes       1537 bytes
                       (35%)           (59%)             (51%)            (69%)
sgjm2.txt  7061        4357 bytes      2565 bytes        2547 bytes       2129 bytes
           bytes       (38%)           (63%)             (64%)            (70%)
sgjm3.rtf  2891        1842 bytes      1303 bytes        1269 bytes       896 bytes
           bytes       (36%)           (55%)             (56%)            (69%)
sgjm4.txt  431 bytes 388 bytes         260 bytes         231 bytes        158 bytes
                       (9%)            (39%)             (46%)            (63%)
sgjm5.rtf  2037        1330 bytes      859 bytes         828 bytes        635 bytes
           bytes       (35%)           (58%)             (59%)            (63%)
sgjm6.txt  3369        2196 bytes      1545 bytes        1504 bytes       1110 bytes
           bytes       (35%)           (54%)             (55%)            (67%)
sgjm7.txt  10549       5457 bytes      3933 bytes        3923 bytes       3492 bytes
           bytes       (47%)           (63%)             (63%)            (66%)
sgjm8.txt  7584        4216 bytes      3067 bytes        3048 bytes       2389 bytes
           bytes       (44%)           (59%)             (60%)            (68%)
sgjm9.rtf  5529        3249 bytes      2351 bytes        2324 bytes       1793 bytes
           bytes       (41%)           (57%)             (58%)            (67%)
sgjm10.rtf 4152        2658 bytes      1869 bytes        1831 bytes       1428 bytes
           bytes       (36%)           (55%)             (56%)            (66%)
sgjm11.cpp 458 bytes 421 bytes         259 bytes         239 bytes        134 bytes
                       (8%)            (43%)             (48%)            (70%)
               Table 1 Compression of text files for different algorithms
6. CONCLUSIONS
        In this paper, a new text compression algorithm used to compress different type of
text files has been introduced. The main advantage of this compression scheme is that the
algorithm gives better compression than existing algorithms for different text file sizes.
This compression scheme is comparable to the Lempel-Zip Welch Algorithm,
Winzip10.0 and Winrar3.93 in terms of compression ratio.
REFERENCES
[1] J.Ziv and A. Lempel, “Compression of individual sequences via variable length
    coding”, IEEE Transaction on Information Theory, Vol 24: pp. 530 – 536, 1978.
[2] J.Ziv and A. Lempel, “A universal algorithm for sequential data compression”, IEEE
    Transaction on Information Theory, Vol 23: pp. 337 – 343, May 1977.



                                                 45
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME


[3] Gonzalo Navarro and Mathieu A Raffinot, “General Practical Approach to Pattern
    Matchingover Ziv-Lempel Compressed Text”, Proc. CPM’99, LNCS 1645, Pages14-
    36.
[4] S. Bhattacharjee, J. Bhattacharya, U. Raghavendra, D.Saha, P. Pal Chaudhuri, “A
    VLSI architecture for cellular automata based parallel data compression”, IEEE-
    2006,Bangalore, India, Jan 03-06.
[5] Khalid Sayood, “An Introduction to Data Compression”, Academic Press, 1996.
[6] David Solomon, “Data Compression: The Complete Reference”, Springer
    Publication, 2000.
[7] M. Atallah and Y. Genin, “Pattern matching text compression: Algorithmic and
    empirical results”, International Conference on Data Compression, vol II: pp. 349-
    352, Lausanne, 1996.
[8] Mark Nelson and Jean-Loup Gaily, “The Data Compression Book”, Second Edition,
    M&T Books.
[9] Timothy C. Bell, “Text Compression”, Prentice Hall Publishers, 1990.
[10] Ranjan Parekh,” Principles of Multimedia”, Tata McGraw-Hill Companies, 2006
[11] Amiya Halder, Sourav Dey, Soumyodeep Mukherjee and Ayan Banerjee, “An
    Efficient Image Compression Algorithm Based on Block Optimization and Byte
    Compression”, ICISA-2010, Chennai, Tamilnadu, India, pp.14-18, Feb 6, 2010.
[12] Ayan Banerjee and Amiya Halder, “An Efficient Image Compression Algorithm
    Based on Block Optimization, Byte Compression and Run-Length Encoding along Y-
    axis”, IEEE ICCSIT 2010, Chengdu, China, IEEE Computer Society Press, July 9-11,
    2010.
[13] Rafael C. Gonzalez, Richard E. Woods, Digital Image Processing.
[14] Debashis Chakraborty, Sutirtha Ghosh and Joydeep Mukherjee, “An Efficient Data
    Compression Algorithm Using Differential Feature Extraction”, NCETCS August 26-
    28,2010.




                                                 46

Weitere ähnliche Inhalte

Was ist angesagt?

Comparision Of Various Lossless Image Compression Techniques
Comparision Of Various Lossless Image Compression TechniquesComparision Of Various Lossless Image Compression Techniques
Comparision Of Various Lossless Image Compression TechniquesIJERA Editor
 
Lossless Data Compression Using Rice Algorithm Based On Curve Fitting Technique
Lossless Data Compression Using Rice Algorithm Based On Curve Fitting TechniqueLossless Data Compression Using Rice Algorithm Based On Curve Fitting Technique
Lossless Data Compression Using Rice Algorithm Based On Curve Fitting TechniqueIRJET Journal
 
Multi Document Text Summarization using Backpropagation Network
Multi Document Text Summarization using Backpropagation NetworkMulti Document Text Summarization using Backpropagation Network
Multi Document Text Summarization using Backpropagation NetworkIRJET Journal
 
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSIONADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSIONcsandit
 
Highly secure scalable compression of encrypted images
Highly secure scalable compression of encrypted imagesHighly secure scalable compression of encrypted images
Highly secure scalable compression of encrypted imageseSAT Journals
 
BPSC Previous Year Question for AP, ANE, AME, ADA, AE
BPSC Previous Year Question for AP, ANE, AME, ADA, AE BPSC Previous Year Question for AP, ANE, AME, ADA, AE
BPSC Previous Year Question for AP, ANE, AME, ADA, AE Engr. Md. Jamal Uddin Rayhan
 
AN INNOVATIVE IDEA FOR PUBLIC KEY METHOD OF STEGANOGRAPHY
AN INNOVATIVE IDEA FOR PUBLIC KEY METHOD OF STEGANOGRAPHYAN INNOVATIVE IDEA FOR PUBLIC KEY METHOD OF STEGANOGRAPHY
AN INNOVATIVE IDEA FOR PUBLIC KEY METHOD OF STEGANOGRAPHYJournal For Research
 
Radical Data Compression Algorithm Using Factorization
Radical Data Compression Algorithm Using FactorizationRadical Data Compression Algorithm Using Factorization
Radical Data Compression Algorithm Using FactorizationCSCJournals
 
AUTOCONFIGURATION ALGORITHM FOR A MULTIPLE INTERFACES ADHOC NETWORK RUNNING...
AUTOCONFIGURATION ALGORITHM FOR A  MULTIPLE INTERFACES ADHOC NETWORK  RUNNING...AUTOCONFIGURATION ALGORITHM FOR A  MULTIPLE INTERFACES ADHOC NETWORK  RUNNING...
AUTOCONFIGURATION ALGORITHM FOR A MULTIPLE INTERFACES ADHOC NETWORK RUNNING...IJCNC
 
Bank Question Solution-ADBA Previous Year Question for AP, ANE, AME, ADA, AE
Bank Question Solution-ADBA Previous Year Question for AP, ANE, AME, ADA, AEBank Question Solution-ADBA Previous Year Question for AP, ANE, AME, ADA, AE
Bank Question Solution-ADBA Previous Year Question for AP, ANE, AME, ADA, AEEngr. Md. Jamal Uddin Rayhan
 
Dynamic thresholding on speech segmentation
Dynamic thresholding on speech segmentationDynamic thresholding on speech segmentation
Dynamic thresholding on speech segmentationeSAT Journals
 
IRJET- Enhanced Density Based Method for Clustering Data Stream
IRJET-  	  Enhanced Density Based Method for Clustering Data StreamIRJET-  	  Enhanced Density Based Method for Clustering Data Stream
IRJET- Enhanced Density Based Method for Clustering Data StreamIRJET Journal
 
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...IJERA Editor
 
Automatic Synthesis and Formal Verification of Interfaces Between Incompatibl...
Automatic Synthesis and Formal Verification of Interfaces Between Incompatibl...Automatic Synthesis and Formal Verification of Interfaces Between Incompatibl...
Automatic Synthesis and Formal Verification of Interfaces Between Incompatibl...IDES Editor
 
A novel efficient multiple encryption algorithm for real time images
A novel efficient multiple encryption algorithm for real time images A novel efficient multiple encryption algorithm for real time images
A novel efficient multiple encryption algorithm for real time images IJECEIAES
 
Improved wolf algorithm on document images detection using optimum mean techn...
Improved wolf algorithm on document images detection using optimum mean techn...Improved wolf algorithm on document images detection using optimum mean techn...
Improved wolf algorithm on document images detection using optimum mean techn...journalBEEI
 
Pioneering VDT Image Compression using Block Coding
Pioneering VDT Image Compression using Block CodingPioneering VDT Image Compression using Block Coding
Pioneering VDT Image Compression using Block CodingDR.P.S.JAGADEESH KUMAR
 
High capacity histogram shifting based reversible data hiding with data compr...
High capacity histogram shifting based reversible data hiding with data compr...High capacity histogram shifting based reversible data hiding with data compr...
High capacity histogram shifting based reversible data hiding with data compr...IAEME Publication
 

Was ist angesagt? (20)

Comparision Of Various Lossless Image Compression Techniques
Comparision Of Various Lossless Image Compression TechniquesComparision Of Various Lossless Image Compression Techniques
Comparision Of Various Lossless Image Compression Techniques
 
Lossless Data Compression Using Rice Algorithm Based On Curve Fitting Technique
Lossless Data Compression Using Rice Algorithm Based On Curve Fitting TechniqueLossless Data Compression Using Rice Algorithm Based On Curve Fitting Technique
Lossless Data Compression Using Rice Algorithm Based On Curve Fitting Technique
 
Multi Document Text Summarization using Backpropagation Network
Multi Document Text Summarization using Backpropagation NetworkMulti Document Text Summarization using Backpropagation Network
Multi Document Text Summarization using Backpropagation Network
 
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSIONADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
 
Highly secure scalable compression of encrypted images
Highly secure scalable compression of encrypted imagesHighly secure scalable compression of encrypted images
Highly secure scalable compression of encrypted images
 
BPSC Previous Year Question for AP, ANE, AME, ADA, AE
BPSC Previous Year Question for AP, ANE, AME, ADA, AE BPSC Previous Year Question for AP, ANE, AME, ADA, AE
BPSC Previous Year Question for AP, ANE, AME, ADA, AE
 
Eg25807814
Eg25807814Eg25807814
Eg25807814
 
AN INNOVATIVE IDEA FOR PUBLIC KEY METHOD OF STEGANOGRAPHY
AN INNOVATIVE IDEA FOR PUBLIC KEY METHOD OF STEGANOGRAPHYAN INNOVATIVE IDEA FOR PUBLIC KEY METHOD OF STEGANOGRAPHY
AN INNOVATIVE IDEA FOR PUBLIC KEY METHOD OF STEGANOGRAPHY
 
Radical Data Compression Algorithm Using Factorization
Radical Data Compression Algorithm Using FactorizationRadical Data Compression Algorithm Using Factorization
Radical Data Compression Algorithm Using Factorization
 
Is3314841490
Is3314841490Is3314841490
Is3314841490
 
AUTOCONFIGURATION ALGORITHM FOR A MULTIPLE INTERFACES ADHOC NETWORK RUNNING...
AUTOCONFIGURATION ALGORITHM FOR A  MULTIPLE INTERFACES ADHOC NETWORK  RUNNING...AUTOCONFIGURATION ALGORITHM FOR A  MULTIPLE INTERFACES ADHOC NETWORK  RUNNING...
AUTOCONFIGURATION ALGORITHM FOR A MULTIPLE INTERFACES ADHOC NETWORK RUNNING...
 
Bank Question Solution-ADBA Previous Year Question for AP, ANE, AME, ADA, AE
Bank Question Solution-ADBA Previous Year Question for AP, ANE, AME, ADA, AEBank Question Solution-ADBA Previous Year Question for AP, ANE, AME, ADA, AE
Bank Question Solution-ADBA Previous Year Question for AP, ANE, AME, ADA, AE
 
Dynamic thresholding on speech segmentation
Dynamic thresholding on speech segmentationDynamic thresholding on speech segmentation
Dynamic thresholding on speech segmentation
 
IRJET- Enhanced Density Based Method for Clustering Data Stream
IRJET-  	  Enhanced Density Based Method for Clustering Data StreamIRJET-  	  Enhanced Density Based Method for Clustering Data Stream
IRJET- Enhanced Density Based Method for Clustering Data Stream
 
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
 
Automatic Synthesis and Formal Verification of Interfaces Between Incompatibl...
Automatic Synthesis and Formal Verification of Interfaces Between Incompatibl...Automatic Synthesis and Formal Verification of Interfaces Between Incompatibl...
Automatic Synthesis and Formal Verification of Interfaces Between Incompatibl...
 
A novel efficient multiple encryption algorithm for real time images
A novel efficient multiple encryption algorithm for real time images A novel efficient multiple encryption algorithm for real time images
A novel efficient multiple encryption algorithm for real time images
 
Improved wolf algorithm on document images detection using optimum mean techn...
Improved wolf algorithm on document images detection using optimum mean techn...Improved wolf algorithm on document images detection using optimum mean techn...
Improved wolf algorithm on document images detection using optimum mean techn...
 
Pioneering VDT Image Compression using Block Coding
Pioneering VDT Image Compression using Block CodingPioneering VDT Image Compression using Block Coding
Pioneering VDT Image Compression using Block Coding
 
High capacity histogram shifting based reversible data hiding with data compr...
High capacity histogram shifting based reversible data hiding with data compr...High capacity histogram shifting based reversible data hiding with data compr...
High capacity histogram shifting based reversible data hiding with data compr...
 

Andere mochten auch

Improving the global parameter signal to distortion value in music signals
Improving the global parameter signal to distortion value in music signalsImproving the global parameter signal to distortion value in music signals
Improving the global parameter signal to distortion value in music signalsiaemedu
 
A comprehensive study of non blocking joining technique
A comprehensive study of non blocking joining techniqueA comprehensive study of non blocking joining technique
A comprehensive study of non blocking joining techniqueiaemedu
 
Performance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presencePerformance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presenceiaemedu
 
Revisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationRevisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationiaemedu
 
Website based patent information searching mechanism
Website based patent information searching mechanismWebsite based patent information searching mechanism
Website based patent information searching mechanismiaemedu
 
Adaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationAdaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationiaemedu
 

Andere mochten auch (7)

Improving the global parameter signal to distortion value in music signals
Improving the global parameter signal to distortion value in music signalsImproving the global parameter signal to distortion value in music signals
Improving the global parameter signal to distortion value in music signals
 
A comprehensive study of non blocking joining technique
A comprehensive study of non blocking joining techniqueA comprehensive study of non blocking joining technique
A comprehensive study of non blocking joining technique
 
Performance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presencePerformance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presence
 
Revisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationRevisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modification
 
Website based patent information searching mechanism
Website based patent information searching mechanismWebsite based patent information searching mechanism
Website based patent information searching mechanism
 
Adaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationAdaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow application
 
college
collegecollege
college
 

Ähnlich wie Efficient text compression using special character replacement

OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHMOPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHMJitendra Choudhary
 
Lossless LZW Data Compression Algorithm on CUDA
Lossless LZW Data Compression Algorithm on CUDALossless LZW Data Compression Algorithm on CUDA
Lossless LZW Data Compression Algorithm on CUDAIOSR Journals
 
Automation tool for evaluation of the quality of nlp based
Automation tool for evaluation of the quality of nlp basedAutomation tool for evaluation of the quality of nlp based
Automation tool for evaluation of the quality of nlp basedIAEME Publication
 
A comprehensive study of non blocking joining techniques
A comprehensive study of non blocking joining techniquesA comprehensive study of non blocking joining techniques
A comprehensive study of non blocking joining techniquesIAEME Publication
 
Automatic document clustering
Automatic document clusteringAutomatic document clustering
Automatic document clusteringIAEME Publication
 
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONSSVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONSijscmcj
 
Compiler_Project_Srikanth_Vanama
Compiler_Project_Srikanth_VanamaCompiler_Project_Srikanth_Vanama
Compiler_Project_Srikanth_VanamaSrikanth Vanama
 
VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...
VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...
VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...VLSICS Design
 
VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...
VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...
VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...VLSICS Design
 
The machine learning method regarding efficient soft computing and ict using svm
The machine learning method regarding efficient soft computing and ict using svmThe machine learning method regarding efficient soft computing and ict using svm
The machine learning method regarding efficient soft computing and ict using svmIAEME Publication
 
The machine learning method regarding efficient soft computing and ict using svm
The machine learning method regarding efficient soft computing and ict using svmThe machine learning method regarding efficient soft computing and ict using svm
The machine learning method regarding efficient soft computing and ict using svmIAEME Publication
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016ijcsbi
 
Cj32980984
Cj32980984Cj32980984
Cj32980984IJMER
 
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...IRJET Journal
 
Instruction level parallelism using ppm branch prediction
Instruction level parallelism using ppm branch predictionInstruction level parallelism using ppm branch prediction
Instruction level parallelism using ppm branch predictionIAEME Publication
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 

Ähnlich wie Efficient text compression using special character replacement (20)

OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHMOPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
OPTIMIZATION OF LZ77 DATA COMPRESSION ALGORITHM
 
50120130405006
5012013040500650120130405006
50120130405006
 
Lossless LZW Data Compression Algorithm on CUDA
Lossless LZW Data Compression Algorithm on CUDALossless LZW Data Compression Algorithm on CUDA
Lossless LZW Data Compression Algorithm on CUDA
 
Compression tech
Compression techCompression tech
Compression tech
 
Automation tool for evaluation of the quality of nlp based
Automation tool for evaluation of the quality of nlp basedAutomation tool for evaluation of the quality of nlp based
Automation tool for evaluation of the quality of nlp based
 
A comprehensive study of non blocking joining techniques
A comprehensive study of non blocking joining techniquesA comprehensive study of non blocking joining techniques
A comprehensive study of non blocking joining techniques
 
Automatic document clustering
Automatic document clusteringAutomatic document clustering
Automatic document clustering
 
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONSSVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
 
Compiler_Project_Srikanth_Vanama
Compiler_Project_Srikanth_VanamaCompiler_Project_Srikanth_Vanama
Compiler_Project_Srikanth_Vanama
 
VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...
VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...
VLSI Architecture for Nano Wire Based Advanced Encryption Standard (AES) with...
 
VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...
VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...
VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH...
 
The machine learning method regarding efficient soft computing and ict using svm
The machine learning method regarding efficient soft computing and ict using svmThe machine learning method regarding efficient soft computing and ict using svm
The machine learning method regarding efficient soft computing and ict using svm
 
The machine learning method regarding efficient soft computing and ict using svm
The machine learning method regarding efficient soft computing and ict using svmThe machine learning method regarding efficient soft computing and ict using svm
The machine learning method regarding efficient soft computing and ict using svm
 
XML
XMLXML
XML
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016
 
Cj32980984
Cj32980984Cj32980984
Cj32980984
 
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
 
Instruction level parallelism using ppm branch prediction
Instruction level parallelism using ppm branch predictionInstruction level parallelism using ppm branch prediction
Instruction level parallelism using ppm branch prediction
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Data compression
Data compression Data compression
Data compression
 

Mehr von iaemedu

Tech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech inTech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech iniaemedu
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesiaemedu
 
Effective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridEffective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridiaemedu
 
Effect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingEffect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingiaemedu
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reorderingiaemedu
 
Semantic web services and its challenges
Semantic web services and its challengesSemantic web services and its challenges
Semantic web services and its challengesiaemedu
 
Prediction of customer behavior using cma
Prediction of customer behavior using cmaPrediction of customer behavior using cma
Prediction of customer behavior using cmaiaemedu
 
Performance measurement of different requirements engineering
Performance measurement of different requirements engineeringPerformance measurement of different requirements engineering
Performance measurement of different requirements engineeringiaemedu
 
Mobile safety systems for automobiles
Mobile safety systems for automobilesMobile safety systems for automobiles
Mobile safety systems for automobilesiaemedu
 
Agile programming a new approach
Agile programming a new approachAgile programming a new approach
Agile programming a new approachiaemedu
 
Adaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentAdaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentiaemedu
 
A survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationA survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationiaemedu
 
A survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksA survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksiaemedu
 
A novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyA novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyiaemedu
 
A self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryA self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryiaemedu
 
A comparative study on multicast routing using dijkstra’s
A comparative study on multicast routing using dijkstra’sA comparative study on multicast routing using dijkstra’s
A comparative study on multicast routing using dijkstra’siaemedu
 
The detection of routing misbehavior in mobile ad hoc networks
The detection of routing misbehavior in mobile ad hoc networksThe detection of routing misbehavior in mobile ad hoc networks
The detection of routing misbehavior in mobile ad hoc networksiaemedu
 
Visual cryptography scheme for color images
Visual cryptography scheme for color imagesVisual cryptography scheme for color images
Visual cryptography scheme for color imagesiaemedu
 
Software process methodologies and a comparative study of various models
Software process methodologies and a comparative study of various modelsSoftware process methodologies and a comparative study of various models
Software process methodologies and a comparative study of various modelsiaemedu
 
Software metric analysis methods for product development
Software metric analysis methods for product developmentSoftware metric analysis methods for product development
Software metric analysis methods for product developmentiaemedu
 

Mehr von iaemedu (20)

Tech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech inTech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech in
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniques
 
Effective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridEffective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using grid
 
Effect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingEffect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routing
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reordering
 
Semantic web services and its challenges
Semantic web services and its challengesSemantic web services and its challenges
Semantic web services and its challenges
 
Prediction of customer behavior using cma
Prediction of customer behavior using cmaPrediction of customer behavior using cma
Prediction of customer behavior using cma
 
Performance measurement of different requirements engineering
Performance measurement of different requirements engineeringPerformance measurement of different requirements engineering
Performance measurement of different requirements engineering
 
Mobile safety systems for automobiles
Mobile safety systems for automobilesMobile safety systems for automobiles
Mobile safety systems for automobiles
 
Agile programming a new approach
Agile programming a new approachAgile programming a new approach
Agile programming a new approach
 
Adaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentAdaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environment
 
A survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationA survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow application
 
A survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksA survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networks
 
A novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyA novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classify
 
A self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryA self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imagery
 
A comparative study on multicast routing using dijkstra’s
A comparative study on multicast routing using dijkstra’sA comparative study on multicast routing using dijkstra’s
A comparative study on multicast routing using dijkstra’s
 
The detection of routing misbehavior in mobile ad hoc networks
The detection of routing misbehavior in mobile ad hoc networksThe detection of routing misbehavior in mobile ad hoc networks
The detection of routing misbehavior in mobile ad hoc networks
 
Visual cryptography scheme for color images
Visual cryptography scheme for color imagesVisual cryptography scheme for color images
Visual cryptography scheme for color images
 
Software process methodologies and a comparative study of various models
Software process methodologies and a comparative study of various modelsSoftware process methodologies and a comparative study of various models
Software process methodologies and a comparative study of various models
 
Software metric analysis methods for product development
Software metric analysis methods for product developmentSoftware metric analysis methods for product development
Software metric analysis methods for product development
 

Efficient text compression using special character replacement

  • 1. International Journal of Computer and Technology (IJCET), ISSN 0976 – 6367(Print), International Journal of Computer Engineering Engineering ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME and Technology (IJCET), ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 1 IJCET Number 2, Sept - Oct (2010), pp. 38-46 ©IAEME © IAEME, http://www.iaeme.com/ijcet.html EFFICIENT TEXT COMPRESSION USING SPECIAL CHARACTER REPLACEMENT AND SPACE REMOVAL Debashis Chakraborty Department of Computer Science & Engineering St. Thomas’ College of Engineering. & Technology Kolkata-23, West Bengal E-Mail: sunnydeba@gmail.com Sutirtha Ghosh Department of Information Technology St. Thomas’ College of Engineering. & Technology Kolkata-23, West Bengal E-Mail: sutirtha84@yahoo.co.in Joydeep Mukherjee Department of Information Technology St. Thomas’ College of Engineering. & Technology Kolkata-23, West Bengal ABSTRACT In this paper, we have proposed a new concept of text compression/decompression algorithm using special character replacement technique. Moreover after the initial compression after replacement of special characters, we remove the spaces between the words in the intermediary compressed file in specific situations to get the final compressed text file. Experimental results show that the proposed algorithm is very simple in implementation, fast in encoding time and high in compression ratio and even gives better compression than existing algorithms like LZW, WINZIP 10.0 and WINRAR 3.93. Keywords: Lossless compresssion; Lossy compression; Non-printable ASCII value; Special character, Index, Symbols. 38
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME INTRODUCTION As evident from the name itself data compression is concerned with the compression of a given set of data [5,6,8]. The primary reason behind doing so is to reduce the storage space required to save the data, or the bandwidth required to transmit it. Although storage technology has developed significantly over the past decade, the same cannot be said for transmission capacity. As a result the concept of compressing data becomes very important. Data compression or source coding is the process of encoding information using fewer bits (or other information bearing units) than an unencoded representation would use through specific use of encoding schemes. It follows that the receiver must be aware of the encoding scheme in order to decode the data to its original form. The compression schemes that are designed are basically trade- offs among the degree of data compression, the amount of distortion introduced and the resources (software and hardware) required to compress and decompress data [5,9]. Data compression schemes may broadly be classified into – 1.Lossless compression and 2.Lossy compression. Lossless compression algorithms usually exploit statistical redundancy in such a way as to represent the sender’s data more concisely without error. Lossless compression is possible because most real world data has statistical redundancy. Another kind of compression, called lossy data compression is possible if some loss of fidelity is acceptable. It is important to consider that in case of lossy compression, the original data cannot be reconstructed from the compressed data due to rounding off or removal of some parts of data as a result of redundancies. These types of compression are also widely used in Image compression [10, 11, 12, 13]. The theoretical background of compression is provided by information theory and by rate distortion theory. There is a close connection between machine learning and compression: a system that predicts the posterior probabilities of a sequence given its entire history can be used for optimal data compression (by using arithmetic coding on the output distribution), while an optimal compressor can be used for prediction (by finding the symbol that compresses best, given the previous history). This equivalence has been used as justification for data compression as a benchmark for "general intelligence". We hereby focus on the compression of text. Various algorithms have been proposed for text compression 39
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME [1,2,3,4,7]. We proposed an efficient text compression algorithm that should yield better compression than existing algorithm like Lempel-Zip-Welch and existing software like Winzip10.0 and Winrar3.93, while ensuring the compression is a lossless compression process. Our proposed algorithm is based on a systematic special character replacement technique. The rest of this paper is organized as follows:- Section II, the concepts of special character replacement is provided. Section III describes the creation and maintenance of dynamic dictionary. Section IV describes the process of removal of spaces between special symbols in the intermediary compressed file. Section V gives the proposed algorithm and section VI described the experimental results and Section VII concludes the paper. 1. SPECIAL CHARACTER REPLACEMENT In the proposed algorithm we replaced every word in a text with an ASCII character. In the extended ASCII set of characters there are two hundred and fifty-four (254) characters. Among these some of them hold NULL, Space, Linefeed or English alphabets as their special symbols. Neglecting them there are one hundred and eighty- four(184) ASCII characters that have been used in this proposed algorithm. In this proposed algorithm, one letter or two letter English words in the text file are not replaced with an ASCII character. A non-printable ASCII character replaces words having more than two letters. For an example, the word ‘of’ remains the same, whereas a non-printable ASCII character ‘1’ replaces the word ‘name’. Whenever a new word is found we maintained an index (integer) and the corresponding special ASCII character is replaced for the word in the compressed text file. When the word is repeated in the file, it is replaced by the same ASCII value assigned to it previously. In this algorithm, we used one hundred and eighty-four symbols for the first one hundred and eighty-four words. Once the number of words exceeds the above value, we combined the ASCII characters to generate new symbols for the new words in the text file. When a space is encountered between two words, it is replaced with an integer ‘0’. To determine the end of a statement symbol ‘9’ is used, so the termination of a sentence 40
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME can be identified during the process of decompression of the text file from the compressed file. For an example, suppose there is a line of text: My name is Debashis Chakraborty. Assuming this is the first sentence in the file and following the proposed algorithm the words ‘My’ and ‘is’, are kept unchanged in the compressed file. ‘Name’ being the first word to be compressed we assigned the index ‘1’ to it and replace the word with the ASCII character of ‘1’. Similar process is repeated for the other words whose length is greater than two. We also replaced the space between words with ‘0’ and the ‘.’ with ‘9’. Therefore the corresponding compressed sentence for the above example is: My0$0is#0&9 where $, #,& are non printable ASCII characters for integers ‘1’ for ‘name’, ‘2’ for ‘Debashis’ and ‘3’ for ‘Chakraborty’ respectively, each occupying one byte of memory space in the memory. The original line of text had 32 bytes, whereas the compressed line has 12 bytes. Thus the above proposed method enables us to obtain comprehensive compression of text, resulting in better transmission bandwidth management and requires less storage. 2. CREATION OF DYNAMIC DICTIONARY Text Compression Algorithms should always be lossless compression algorithms, preventing any loss of information. The text file regenerated from the compressed file must be identical to the original file. All text compression algorithms maintain a dictionary containing the words that appear in the text file. The text file is regenerated from the original file with the help of this dictionary. Dictionaries maintained can either be static or dynamic. In this proposed algorithm, dynamic dictionary is used. We maintained a table containing the fields, named ‘Index’, Symbol’ and ‘Word’ to form the dictionary. Initially the table is empty. When a word to be compressed in the text file is encountered, check whether the word exists in the table. Every time a new word is found in the text file, assign an integer value to it and tabulate its special symbol using single non- printable ASCII characters or combination of such character symbols. It stored the assigned integer under index field, the special symbol under symbol field and the 41
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME corresponding word under the word field. Every time a new word is found in the text the dictionary is updated using the same procedure. When a word is repeated, the already assigned symbol for the word is used. During the process of decompression, the special symbol in the compressed file is searched to obtain its corresponding integer value or index and the corresponding word. Finally the symbols are replaced with their corresponding words to regenerate the original file. 3. REMOVAL OF SPACES FROM THE INTRMEDIARY COMPRESSED FILE Every file contains spaces between the words to identify different words. The words are separated from each other with spaces. Here we propose a method to remove these spaces without loosing any information to obtain better compression. The usage of special symbols for every word in the original file compresses the size of the file and an intermediary compressed file is obtained. We do not remove the spaces from the intermediary file. Instead it contains ‘0’ to specify the location of spaces between words. Every word in the original file is replaced by either one special symbol or a combination of two special symbols. In the intermediary compressed file when we obtain ‘0’ after one special symbol, the contents are not modified. Whereas when a word is replaced by a combination of two symbols or the word is a one or two lettered word( no replacement in the intermediary compressed file), we remove the ‘0” after them i.e. the space between the present word and next word is removed. For example, suppose the there is a line of text: My name is Debashis Chakraborty. Assuming the special symbol for ‘name’ is ‘$’, ‘Debashis’ is ‘##’ and ‘Chakraborty’ is ‘@’, then after the final compression the output of the above sentence is: My$0is##@9 4. PROPOSED ALGORITHM We proposed an algorithm that takes a text file as input. The proposed algorithm can compress text files to comparable size of Lempel-Ziv Welch Algorithm, Winzip10.0 and Winrar3.93. The proposed algorithm is: 42
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME Algorithm for Compression Step 1: Read the contents of Text file on word at a time. Step 2: Create a dictionary containing the fields ‘Index’, ‘Symbol’ and ‘Word’. The dictionary is initially empty. Step 3: Length Calculation • Calculate the length of the word read from the text file. • Write the original word into the intermediary file if the length of the word is less than or equal to two. • If the read word is single character and represents “.” replace with ‘9’ in the compressed file and if the read character is a space between two words replace with ‘0’ in the compressed file. • For word of length greater than two it is replaced with special symbol (non- printable ASCII character or combination of ASCII characters). Step 4: Special Character replacement • Check whether the word exists in the dictionary. • For a new word assign an integer which acts as the index and a special symbol for the corresponding word. • For the index of the word being less than one hundred and eighty-four, assign index’s respective single character ASCII symbol as their special symbol. • If a word has an index more than one hundred and eighty-three, combine ASCII characters to form the new symbol. • Update the dictionary by inserting the new word along with its index value and assigned symbol, which can be used for future reference. • For repetition of an existing word replace the pre-assigned symbol for the word as obtained from the dictionary. Step 5: Continue the above process of compression and updation of the dynamic dictionary till the end of the original file is reached. Step 6: Removal of Spaces from the intermediary file • Read the contents of the intermediary file, one special symbol at a time. 43
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME • Check whether the word in the original file is replaced by one special character as symbol or a combination of two. • If there is ‘0’ after symbol containing one special character as a replacement of a word, retain the zero. • If there is a combination of special characters to represent a word or the word itself (one or two letter words) remove the ‘0’ (representing space between words) to obtain the final compressed file. Step 7: Continue the above process of compression till the end of the intermediary file is reached. Algorithm for Decompression Step 1: Read the symbol form the compressed file. Step 2: If the read symbol is ‘0’ replace with a space or tab. For the symbol ‘9’ replace with a “.” to indicate end of sentence. Step 3: Decoding of Special Characters • If the read symbol from compressed file is an English alphabet, write the same into the decompressed file. Write ‘space’ or tab after the word in the decompressed file. • For a special symbol, find a match for it in the dictionary and write the corresponding word in the decompressed file. • If the special symbols are a combination of two special characters, write space or tab after the corresponding word in the decompressed file. Step 4: Continue the above process till the end of the compressed file is reached. 5. EXPERIMENTAL RESULTS The algorithm developed has been simulated using TURBO C. The input text files are considered to be .txt,. rtf, .cpp and .c files. All the text files that we have tested are of different sizes. The compression ratios obtained are tabulated in Table 1. The compression ratio is better than that of Lempel-Ziv Welch Algorithm,Winzip10.0 and Winrar3.93 for majority of the text files. All the text files reconstructed from the compressed file are of the same size as that of original file. Therefore the proposed algorithm follows lossless compression. 44
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME Original Original Compression Compression Compression Compression File File Size by by by by Proposed LZW WINRAR WINZIP Algorithm 3.93 10.0 sgjm1.txt 5046bytes 3292 bytes 2056bytes 2468 bytes 1537 bytes (35%) (59%) (51%) (69%) sgjm2.txt 7061 4357 bytes 2565 bytes 2547 bytes 2129 bytes bytes (38%) (63%) (64%) (70%) sgjm3.rtf 2891 1842 bytes 1303 bytes 1269 bytes 896 bytes bytes (36%) (55%) (56%) (69%) sgjm4.txt 431 bytes 388 bytes 260 bytes 231 bytes 158 bytes (9%) (39%) (46%) (63%) sgjm5.rtf 2037 1330 bytes 859 bytes 828 bytes 635 bytes bytes (35%) (58%) (59%) (63%) sgjm6.txt 3369 2196 bytes 1545 bytes 1504 bytes 1110 bytes bytes (35%) (54%) (55%) (67%) sgjm7.txt 10549 5457 bytes 3933 bytes 3923 bytes 3492 bytes bytes (47%) (63%) (63%) (66%) sgjm8.txt 7584 4216 bytes 3067 bytes 3048 bytes 2389 bytes bytes (44%) (59%) (60%) (68%) sgjm9.rtf 5529 3249 bytes 2351 bytes 2324 bytes 1793 bytes bytes (41%) (57%) (58%) (67%) sgjm10.rtf 4152 2658 bytes 1869 bytes 1831 bytes 1428 bytes bytes (36%) (55%) (56%) (66%) sgjm11.cpp 458 bytes 421 bytes 259 bytes 239 bytes 134 bytes (8%) (43%) (48%) (70%) Table 1 Compression of text files for different algorithms 6. CONCLUSIONS In this paper, a new text compression algorithm used to compress different type of text files has been introduced. The main advantage of this compression scheme is that the algorithm gives better compression than existing algorithms for different text file sizes. This compression scheme is comparable to the Lempel-Zip Welch Algorithm, Winzip10.0 and Winrar3.93 in terms of compression ratio. REFERENCES [1] J.Ziv and A. Lempel, “Compression of individual sequences via variable length coding”, IEEE Transaction on Information Theory, Vol 24: pp. 530 – 536, 1978. [2] J.Ziv and A. Lempel, “A universal algorithm for sequential data compression”, IEEE Transaction on Information Theory, Vol 23: pp. 337 – 343, May 1977. 45
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME [3] Gonzalo Navarro and Mathieu A Raffinot, “General Practical Approach to Pattern Matchingover Ziv-Lempel Compressed Text”, Proc. CPM’99, LNCS 1645, Pages14- 36. [4] S. Bhattacharjee, J. Bhattacharya, U. Raghavendra, D.Saha, P. Pal Chaudhuri, “A VLSI architecture for cellular automata based parallel data compression”, IEEE- 2006,Bangalore, India, Jan 03-06. [5] Khalid Sayood, “An Introduction to Data Compression”, Academic Press, 1996. [6] David Solomon, “Data Compression: The Complete Reference”, Springer Publication, 2000. [7] M. Atallah and Y. Genin, “Pattern matching text compression: Algorithmic and empirical results”, International Conference on Data Compression, vol II: pp. 349- 352, Lausanne, 1996. [8] Mark Nelson and Jean-Loup Gaily, “The Data Compression Book”, Second Edition, M&T Books. [9] Timothy C. Bell, “Text Compression”, Prentice Hall Publishers, 1990. [10] Ranjan Parekh,” Principles of Multimedia”, Tata McGraw-Hill Companies, 2006 [11] Amiya Halder, Sourav Dey, Soumyodeep Mukherjee and Ayan Banerjee, “An Efficient Image Compression Algorithm Based on Block Optimization and Byte Compression”, ICISA-2010, Chennai, Tamilnadu, India, pp.14-18, Feb 6, 2010. [12] Ayan Banerjee and Amiya Halder, “An Efficient Image Compression Algorithm Based on Block Optimization, Byte Compression and Run-Length Encoding along Y- axis”, IEEE ICCSIT 2010, Chengdu, China, IEEE Computer Society Press, July 9-11, 2010. [13] Rafael C. Gonzalez, Richard E. Woods, Digital Image Processing. [14] Debashis Chakraborty, Sutirtha Ghosh and Joydeep Mukherjee, “An Efficient Data Compression Algorithm Using Differential Feature Extraction”, NCETCS August 26- 28,2010. 46