Hello Friends,
Today, I will show you how can you investigate a corrupted PDF. For this purpose I have created a sample PDF. Before reading this article,I will suggest you to read this another article PDF Overview for better understanding of PDF structure.
1. PDF Reader
2. Notepad++ for editing.
So , lets start to get our hand dirty..
First, download this sample PDF and try to open this PDF.
You will see this error message.
Now open this PDF in Notepad++.
Note: I have not encoded the PDF Contents with different filters for simplicity.
PDF file consists of 4 elements:
Now we are able to open this PDF.
We can see that this PDF consists of 2 pages as shown in image above but investigate further to verify it.
Now, we are able to find that this PDF has actually total 5 pages so edit the Count from 2 to 5 and open this PDF.
Now, we are able to see all 5 pages but last page is blank so we will investigate further.
Last page is pointed by 11 0 R indirect object reference.
Contents keyword is used for describing the contents of a file . If this entry is absent then the page is empty.
But in this object number 12 Contents is written as Content so PDF reader is unable to recognize the name Content so it ignores the Content without giving any error.
Replace Content with Contents and open the PDF. Now you are able to see all five pages.
You can download this corrected PDF from this link.
or you can also watch it on youtube
If you are more interested to read about PDF then I recommend you to visit excellent bog of Didier Stevens
Today, I will show you how can you investigate a corrupted PDF. For this purpose I have created a sample PDF. Before reading this article,I will suggest you to read this another article PDF Overview for better understanding of PDF structure.
Tool Required
1. PDF Reader
2. Notepad++ for editing.
So , lets start to get our hand dirty..
First, download this sample PDF and try to open this PDF.
You will see this error message.
Now open this PDF in Notepad++.
Note: I have not encoded the PDF Contents with different filters for simplicity.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | 1 0 obj << /Pages 2 0 R /Type /Catalog >> endobj 2 0 obj << /Count 2 /Kids [ 3 0 R 5 0 R 7 0 R 9 0 R 11 0 R ] /Type /Pages >> endobj 3 0 obj << /MediaBox [ 0 0 795 842 ] /Parent 2 0 R /Contents 4 0 R /Resources << /Font << /F1 << /Name /F1 /BaseFont /Helvetica /Subtype /Type1 /Type /Font >> >> >> /Type /Page >> endobj 4 0 obj << /Length 55 >>stream BT /F1 18 Tf 186 690 Td 20 TL (www.secsavvy.com) Tj ET endstream endobj 5 0 obj << /MediaBox [ 0 0 795 842 ] /Parent 2 0 R /Contents 6 0 R /Resources << /Font << /F1 << /Name /F1 /BaseFont /Helvetica /Subtype /Type1 /Type /Font >> >> >> /Type /Page >> endobj 6 0 obj << /Length 45 >>stream BT /F1 15 Tf 186 690 Td 20 TL (Page 1) Tj ET endstream endobj 7 0 obj << /MediaBox [ 0 0 795 842 ] /Parent 2 0 R /Contents 8 0 R /Resources << /Font << /F1 << /Name /F1 /BaseFont /Helvetica /Subtype /Type1 /Type /Font >> >> >> /Type /Page >> endobj 8 0 obj << /Length 45 >>stream BT /F1 15 Tf 186 690 Td 20 TL (Page 2) Tj ET endstream endobj 9 0 obj << /MediaBox [ 0 0 795 842 ] /Parent 2 0 R /Contents 10 0 R /Resources << /Font << /F1 << /Name /F1 /BaseFont /Helvetica /Subtype /Type1 /Type /Font >> >> >> /Type /Page >> endobj 10 0 obj << /Length 45 >>stream BT /F1 15 Tf 186 690 Td 20 TL (Page 3) Tj ET endstream endobj 11 0 obj << /MediaBox [ 0 0 795 842 ] /Parent 2 0 R /Content 12 0 R /Resources << /Font << /F1 << /Name /F1 /BaseFont /Helvetica /Subtype /Type1 /Type /Font >> >> >> /Type /Page >> endobj 12 0 obj << /Length 47 >>stream BT /F1 15 Tf 186 690 Td 20 TL (Password) Tj ET endstream endobj xref 0 13 0000000000 65535 f 0000000010 00000 n 0000000067 00000 n 0000000161 00000 n 0000000398 00000 n 0000000510 00000 n 0000000747 00000 n 0000000849 00000 n 0000001086 00000 n 0000001188 00000 n 0000001426 00000 n 0000001529 00000 n 0000001768 00000 n trailer << /Root 1 0 R /Size 13 >> startxref 1873 %%EOF |
- PDF header identifying the PDF specification.
- A body containing the objects that make up the document contained in the file
- A cross-reference table containing information about the indirect objects in the file
- A trailer giving the location of the cross-reference table and of certain special objects within the body of the file.
%PDF-1.7
Now we are able to open this PDF.
We can see that this PDF consists of 2 pages as shown in image above but investigate further to verify it.
Now, we are able to find that this PDF has actually total 5 pages so edit the Count from 2 to 5 and open this PDF.
%PDF-1.7
1 0 obj
<<
/Pages 2 0 R
/Type /Catalog
>>
endobj
2 0 obj
<<
/Count 5
/Kids [ 3 0 R 5 0 R 7 0 R 9 0 R 11 0 R ]
/Type /Pages
>>
endobj
1 0 obj
<<
/Pages 2 0 R
/Type /Catalog
>>
endobj
2 0 obj
<<
/Count 5
/Kids [ 3 0 R 5 0 R 7 0 R 9 0 R 11 0 R ]
/Type /Pages
>>
endobj
Now, we are able to see all 5 pages but last page is blank so we will investigate further.
Last page is pointed by 11 0 R indirect object reference.
11 0 obj
<<
/MediaBox [ 0 0 795 842 ]
/Parent 2 0 R
/Content 12 0 R
/Resources <<
/Font <<
/F1 <<
/Name /F1
/BaseFont /Helvetica
/Subtype /Type1
/Type /Font
>>
>>
>>
/Type /Page
>>
endobj
<<
/MediaBox [ 0 0 795 842 ]
/Parent 2 0 R
/Content 12 0 R
/Resources <<
/Font <<
/F1 <<
/Name /F1
/BaseFont /Helvetica
/Subtype /Type1
/Type /Font
>>
>>
>>
/Type /Page
>>
endobj
Contents keyword is used for describing the contents of a file . If this entry is absent then the page is empty.
But in this object number 12 Contents is written as Content so PDF reader is unable to recognize the name Content so it ignores the Content without giving any error.
Replace Content with Contents and open the PDF. Now you are able to see all five pages.
You can download this corrected PDF from this link.
Demo(High Quality)
or you can also watch it on youtube
0 comments:
Post a Comment