با سلام،
من تعدادی فایل تکست (3000 عدد) دارم. که توسط یک نرم افزار فشرده و انکود شده اند. چطور میتونم الگوریتم اعمال شده روی این فایلها رو پیدا کنم؟
I found the following facts about this algorithm:
- Each file consists maximum 8 lines.
- Each line consist exactly 123 character (including spaces)
- The output file is using a compression method.
- The output file starts with the bytes 00 00, indicating the beginning of the compressed data.
- Unrepeated characters are encoded using their ASCII values.
- If one character repeated more than 4 times, it will coded as
“00”+ number of repeated character+ repeated character
For example, if 'A' appears 5 times, it would be encoded as 00 05 41
- Number of characters that continuously coded in ASCII are multiply by 4 and mentioned before each coded group.
- The compressed algorithm shouldn’t be so complicated, because these files belongs to year 2000.
I have 3,000 couple of input and output. Following is a sample: (if you open the output file in a hex editor, I’ll be clearer.)
Input:
SMESP LTFMLTVAPP0120450 F0 RA N41163100E028450700 Y071115 LTFMLTPA YESILKOY APP RADAR
SMESP LTFMLTVAPP0120500 K0 RA N41163100E028450700 Y116160 LTFMLTPA YESILKOY APP RADAR
SMESP LTFMLTVAPP0120700 K0 RA N41163100E028450700 Y071115 LTFMLTPA YESILKOY APP RADAR
SMESP LTFMLTVAPP0121100 K0 RA N41163100E028450700 Y071115 LTFMLTPA YESILKOY APP RADAR
SMESP LTFMLTVAPP0121250 F0 RA N41163100E028450700 Y296340 LTFMLTPA YESILKOY APP RADAR
SMESP LTFMLTVAPP0122475 T0 RA N41163100E028450700 Y071115 LTFMLTPA YESILKOY APP RADAR
SMESP LTFMLTVAPP0122575 T0 RA N41163100E028450700 Y296340 LTFMLTPA YESILKOY APP RADAR
SMESP LTFMLTVAPP0126425 T0 RA N41163100E028450700 Y YESILKOY APP RADAR
Output:
00 00 BC 53 4D 45 53 50 20 4C 54 46 4D 4C 54 56 41 50 50 30 31 32 30 34 35 30 20 46 30 20 20 20 52 41 20 4E 34 31 31 36 33 31 30 30 45 30 32 38 34 35 9D 0C 30 37 30 00 07 20 1C 59 30 37 31 31 31 35 00 0B 20 B1 24 08 50 41 00 0B 20 48 59 45 53 49 4C 4B 4F 59 20 41 50 50 20 52 41 44 41 52 00 07 20
03 B5 07 57 B9 07 35 30 30 20 4B D7 12 01 31 31 36 31 36
C3 BF 07 83 B0 07 91 2F 83 B9 07 C3 6F 0F 43 64 0F
CB BF 07 31 31
C3 BE 07 47 CA 1E 32 D7 1F 17 32 39 36 33 34
03 13 17 DB 6F 0F 32 34 37 35 20 54
83 6D 0F 4B BA 07 32 35
C3 6F 0F 43 64 0F 0F BA 07 36 34 32 00 24 20 43 B6 07
In above example:
00 00: show starting the output file
BC: is equal to 188. 188/4= 47. 47 is the number of ASCII characters that coded continuously.
53 4D 45 …. 38 34 35: is equal “SMESP LTFMLTVAPP0120450 F0 RA N41163100E02845” in ASCII.
9D 0C: ?????
30 37 30: equal “070” is ASCII.
00 07 20: equal to 7 spaces
1C: is equal 28. 28/4=7. 7 is the number of ASCII characters that coded continuously.
And so on…