C#編程讀取文檔Doc、Docx及Pdf內容的辦法。本站提示廣大學習愛好者:(C#編程讀取文檔Doc、Docx及Pdf內容的辦法)文章只能為提供參考,不一定能成為您想要的結果。以下是C#編程讀取文檔Doc、Docx及Pdf內容的辦法正文
本文實例講述了C#編程讀取文檔Doc、Docx及Pdf內容的辦法。分享給年夜家供年夜家參考。詳細剖析以下:
Doc文檔:Microsoft Word 14.0 Object Library (GAC對象,挪用前須要裝置word。裝置的word版本分歧,COM的版本號也會分歧)
Docx文檔:Microsoft Word 14.0 Object Library (GAC對象,挪用前須要裝置word。裝置的word版本分歧,COM的版本號也會分歧)
Pdf文檔:PDFBox
/* 作者:GhostBear */ using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.IO; using System.Text.RegularExpressions; using org.pdfbox.pdmodel; using org.pdfbox.util; using Microsoft.Office.Interop.Word; namespace TestPdfReader { class Program { static void Main(string[] args) { //PDF PDDocument doc = PDDocument.load(@"C:\resume.pdf"); PDFTextStripper pdfStripper = new PDFTextStripper(); string text = pdfStripper.getText(doc); string result = text.WordStr('\t', ' ').WordStr('\n', ' ').WordStr('\r', ' ').WordStr(" ", ""); Console.WriteLine(result); //Doc,Docx object docPath = @"C:\resume.doc"; object docxPath = @"C:\resume.docx"; object missing=System.Reflection.Missing.Value; object readOnly=true; Application wordApp; wordApp = new Application(); Document wordDoc = wordApp.Documents.Open(ref docPath, ref missing, ref readOnly, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing); string text2 = FilterString(wordDoc.Content.Text); wordDoc.Close(ref missing, ref missing, ref missing); wordApp.Quit(ref missing, ref missing, ref missing); Console.WriteLine(text2); Console.Read(); } private static string FilterString(string input) { return Regex.WordStr(input, @"(\a|\t|\n|\s+)", ""); } } }
願望本文所述對年夜家的C#法式設計有所贊助。