分類:
如何在Microsoft Word中‧將PDF檔案內容儲存為文字(text)檔案
01. 今天遇到一個難題,有一大堆PDF檔案(接近50個PDF檔案),需要將內裡內容取出來再作下一步分析。好消息是,PDF檔案內容是一埋可複製的文字,而不是圖像。壞消息是,手上卻沒有像Adobe Acrobat對PDF檔案編輯軟件。
02. 在網上查找一下,原來Word可以打開PDF檔案再作編輯。而我想到的方法,是將PDF檔案內容,以Text檔案方式儲存,再想辦法取出需要的內容。
03. 下一步,就是如何利用Word VBA,做到我想要的結果。
Sub pdf_to_textfile() ' Stop any warning message during WORD saves to text file. Application.DisplayAlerts = wdAlertsNone ' Define working file path and file name. Dim folder_path As String Dim file_name As String ' Define new document for text file. Dim new_document As Document ' Assume all PDF files are stored in this folder. folder_path = "C:\temp\PDF folder\" ' Get all names of working files. file_name = Dir(folder_path & "*.pdf") Do While file_name <> "" ' Set working folder. ChangeFileOpenDirectory folder_path ' Open PDF file one by one Documents.Open filename:=file_name, ConfirmConversions:=False, ReadOnly:=False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:="", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", Format:=wdOpenFormatAuto, XMLTransform:="" ' Select all content inside PDF file. Selection.WholeStory ' Copy all content into clipboard. Selection.Copy ' Create a new document in WORD. Set new_document = Documents.Add ' Paste clipboard into new document. new_document.Content.Paste ' Save new document into text file with encoding 65001, UTF8. ActiveDocument.SaveAs2 filename:=file_name + ".txt", FileFormat:=wdFormatText, Encoding:=65001, LockComments:=False, Password:="", AddToRecentFiles:=True, WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:=False, InsertLineBreaks:=False, AllowSubstitutions:=False, LineEnding:=wdCRLF, CompatibilityMode:=0 ' Close new document. ActiveWindow.Close ' Close PDF file Documents.Close file_name = Dir() Loop End Sub

發佈留言