Investigating Corrupted PDF

Hello Friends,
Today, I will show you how can you investigate a corrupted PDF. For this purpose I have created a sample PDF. Before reading this article,I will suggest you to read this another article PDF Overview for better understanding of PDF structure.



Tool Required

1. PDF Reader
2. Notepad++ for editing.
So , lets start to get our hand dirty..
First, download this sample PDF and try to open this PDF.
You will see this error message.
PDF Forensic
Now open this PDF in Notepad++.
Note: I have not encoded the PDF Contents with different filters for simplicity.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
1 0 obj
<<
	/Pages 2 0 R
	/Type /Catalog
>>
endobj
2 0 obj
<<
	/Count 2
	/Kids [ 3 0 R 5 0 R 7 0 R 9 0 R 11 0 R ]
	/Type /Pages
>>
endobj
3 0 obj
<<
	/MediaBox [ 0 0 795 842 ]
	/Parent 2 0 R
	/Contents 4 0 R
	/Resources <<
		/Font <<
			/F1 <<
				/Name /F1
				/BaseFont /Helvetica
				/Subtype /Type1
				/Type /Font
			>>
		>>
	>>
	/Type /Page
>>
endobj
4 0 obj
<<
	/Length 55
>>stream
BT
/F1 18 Tf
186 690 Td
20 TL
(www.secsavvy.com) Tj
ET
 
endstream
endobj
5 0 obj
<<
	/MediaBox [ 0 0 795 842 ]
	/Parent 2 0 R
	/Contents 6 0 R
	/Resources <<
		/Font <<
			/F1 <<
				/Name /F1
				/BaseFont /Helvetica
				/Subtype /Type1
				/Type /Font
			>>
		>>
	>>
	/Type /Page
>>
endobj
6 0 obj
<<
	/Length 45
>>stream
BT
/F1 15 Tf
186 690 Td
20 TL
(Page 1) Tj
ET
 
endstream
endobj
7 0 obj
<<
	/MediaBox [ 0 0 795 842 ]
	/Parent 2 0 R
	/Contents 8 0 R
	/Resources <<
		/Font <<
			/F1 <<
				/Name /F1
				/BaseFont /Helvetica
				/Subtype /Type1
				/Type /Font
			>>
		>>
	>>
	/Type /Page
>>
endobj
8 0 obj
<<
	/Length 45
>>stream
BT
/F1 15 Tf
186 690 Td
20 TL
(Page 2) Tj
ET
 
endstream
endobj
9 0 obj
<<
	/MediaBox [ 0 0 795 842 ]
	/Parent 2 0 R
	/Contents 10 0 R
	/Resources <<
		/Font <<
			/F1 <<
				/Name /F1
				/BaseFont /Helvetica
				/Subtype /Type1
				/Type /Font
			>>
		>>
	>>
	/Type /Page
>>
endobj
10 0 obj
<<
	/Length 45
>>stream
BT
/F1 15 Tf
186 690 Td
20 TL
(Page 3) Tj
ET
 
endstream
endobj
11 0 obj
<<
	/MediaBox [ 0 0 795 842 ]
	/Parent 2 0 R
	/Content 12 0 R
	/Resources <<
		/Font <<
			/F1 <<
				/Name /F1
				/BaseFont /Helvetica
				/Subtype /Type1
				/Type /Font
			>>
		>>
	>>
	/Type /Page
>>
endobj
12 0 obj
<<
	/Length 47
>>stream
BT
/F1 15 Tf
186 690 Td
20 TL
(Password) Tj
ET
 
endstream
endobj
xref
0 13
0000000000 65535 f
0000000010 00000 n
0000000067 00000 n
0000000161 00000 n
0000000398 00000 n
0000000510 00000 n
0000000747 00000 n
0000000849 00000 n
0000001086 00000 n
0000001188 00000 n
0000001426 00000 n
0000001529 00000 n
0000001768 00000 n
trailer
<<
	/Root 1 0 R
	/Size 13
>>
startxref
1873
%%EOF
PDF file consists of 4 elements:
  • PDF header identifying the PDF specification.
  • A body containing the objects that make up the document contained in the file
  • A cross-reference table containing information about the indirect objects in the file
  • A trailer giving the location of the cross-reference table and of certain special objects within the body of the file.
But in this case there is no header so we will add a PDF header and try to open this PDF.

%PDF-1.7

Now we are able to open this PDF.


PDF Forensic
We can see that this PDF consists of 2 pages as shown in image above but investigate further to verify it.




PDF Forensic
Now, we are able to find that this PDF has actually total 5 pages so edit the Count from 2 to 5 and open this PDF.

%PDF-1.7
1 0 obj
<<
/Pages 2 0 R
/Type /Catalog
>>
endobj
2 0 obj
<<
/Count 5
/Kids [ 3 0 R 5 0 R 7 0 R 9 0 R 11 0 R ]
/Type /Pages
>>
endobj

Now, we are able to see all 5 pages but last page is blank so we will investigate further.
Last page is pointed by 11 0 R indirect object reference.

11 0 obj
<<
/MediaBox [ 0 0 795 842 ]
/Parent 2 0 R
/Content 12 0 R
/Resources <<
/Font <<
/F1 <<
/Name /F1
/BaseFont /Helvetica
/Subtype /Type1
/Type /Font
>>
>>
>>
/Type /Page
>>
endobj



Contents keyword is used for describing the contents of a file . If this entry is absent then the page is empty.
But in this object number 12 Contents is written as Content so PDF reader is unable to recognize the name Content so it ignores the Content without giving any error.
Replace Content with Contents and open the PDF. Now you are able to see all five pages. :)


You can download this corrected PDF from this link.


Demo(High Quality)




or you can also watch it on youtube
If you are more interested to read about PDF then I recommend you to visit excellent bog of Didier Stevens

0 comments:

Post a Comment