When you search for pdf file, as default, SharePoint just looks for metadata and return search result with your expected. But the icon of pdf files display as unknow document type and important is SharePoint can not search within pdf content.
In this post, I will show you how to configure pdf icon in search result or document library and enable full text search for pdf content file.
1. Configure PDF Icon
– Download PDF Icon from Adobe website: http://www.adobe.com/images/pdficon_small.gif and save it to Images folder under 14 hive folder (In my case C:\Program Files\Common Files\Microsoft Shared\Web server extensions\14\Template\Images)
– Open Command Prompt and type IISRESET -Stop to stop IIS and edit Docicon.xml
– Navigate XML folder from C:\Program Files\Common Files\Microsoft Shared\Web Server extensions\14\Template\Xml and open Docicon.xml by NotePad
– Insert <mapping key=”pdf” value=”pdficon_small.gif ” /> within the <ByExtension> section
– Save and Run command IISRESET -Start
– Add PDF file to the supported file types list
- Open SharePoint Central Administration and login with Administrator permission
- Click on Manage service applications link under Application Management section
- Click on Search Service Application
- In the Search Administration configuration page, click on File Types link from left sidebar
- Select “New File Type”
- In File extension textbox, type PDF and then click OK
- Come back to file types list, confirm that PDF file in the list and show with pdf icon.
2. Install Adobe PDF iFilter
– As you know, PDF file is the standard and published by Adobe, that is the reason why SharePoint is not include as default of search. Adobe released Adobe PDF iFilter 9 for 64-bit platforms, which will allow searching PDF files on Microsoft Windows 64-bit platforms for desktop search, Microsoft Office SharePoint Server, Microsoft Exchange Server, and Microsoft SQL Server… More information and download you can visit http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025 for iFilter verson 9
– Download and install PDFiFilter64installer.exe
– Follow installation instructions to complete install and configure SharePoint server to enable PDF indexing. This instructions is used for SharePoint 2007 but it nearly same on SharePoint 2010.
– Finally, Login Central Admin, Navigate Manage Service Application -> Search Service Application -> Select Content Source and Start Full Crawl. You have to run full crawl because SharePoint indexes file name in old file type extension list if you run Incremental Crawl, so that when a new file type is added, it’s required to perform a full crawl to identify all file type in new list.
– Try to make a search within PDF content to test and the result should be like this
If you are not sure and confident with manually configuration as above. You can use the power shell script for out of box automatically configuration from configure PDF Icon to complete PDF iFilter installation. Download the script here. Thanks Josko for useful script.
Cheers
Hoang Nhut Nguyen
Email: nhutcmos@gmail.com
References:
Hi there would you mind sharing which blog platform you’re using? I’m planning to start
my own blog soon but I’m having a tough time selecting between BlogEngine/Wordpress/B2evolution and Drupal. The reason I ask is because your layout seems different then most blogs and I’m looking for something
unique. P.S Apologies for getting off-topic but I had to ask!
If you look at the bottom of the page, it says wordpress.
Helpful post thanks!
Thanks for the post, but I see that it is not searching inside the PDF files, please advise?
Are you missing any step or any errors during installation process?
You can download installation script at https://docs.google.com/open?id=0B55FfAMp1BXdMDRhNzQyZjktM2Y5NS00MjY4LTkwYjUtY2QwOWUxOTQ3MzY0 for automatically setup.
I have not missed any step, the thing is that, my application server is on different machine, with all prerequisites as IIS role etc and database server is on different machine. I have installed this on web/application server.
On application server , I do not see these directories etc. to install, please advise?
I have done the same steps as described at http://support.microsoft.com/kb/2293357 and Working fine in my Test Server (Single Farm) but it is not working yet in Production Server (Multifarm).
My Production Servers are
2 Front End Server (Server Farm 1 & Server Farm 2) running under NBL
2 Storage (Clustered)
I have check Crawl Log (PDF content is not crawling)
Is there thing left to do?
Any Help?
If you are not sure and confident with manually configuration as above. You can use the power shell script for out of box automatically configuration from configure PDF Icon to complete PDF iFilter installation. Download the powershell script here https://docs.google.com/open?id=0B55FfAMp1BXdMDRhNzQyZjktM2Y5NS00MjY4LTkwYjUtY2QwOWUxOTQ3MzY0.
Thanks nhutcmos.
The Script is really useful and Works fine in Single Firm (Separate DB). I was able to make it works in multiple firm after doing some more additional steps.
Others Can find more details from http://social.msdn.microsoft.com/Forums/en-US/sharepointgeneralprevious/thread/2d4470ef-fd0a-4e12-8886-68837865dfba?prof=required
Thank for your link, it’s really helpfull for all of us
May quá tìm được bài này của anh. Em đang có task khách hàng yêu cầu làm full text search cho pdf trong Sharepoint. Hôm nào có dịp ra Hà nội em mời bác đi cafe nhé. Em Giáp fsoft đây.
Hi Giap, lau qua khong gap. Rat vui vi bai viec nay giup ich cho em. Chuc em lam viec hieu qua!!!
Hi… I am using sharepoint 2010..We have number of PDF documents which are secured..that means copy protected (restriction on content copy and paste) and print protected..
I tried installing Adobe PDF ifilter v 6 & then V9 also. But these documents are not been searched
.The PDFs have been secure using acrobat professional XI..where in the changes allowed & Print allow is set to None. Also ‘Enable copying of text, images and other content’ option is unchecked.
Hi Viveka.
Actually PDF iFilter will recognize where is textable and index it for searching.
You can try to open these pdf files in Acrobat Reader, use Select Text function to select, copy and paste to a text file.
If you got the text, it means PDF iFilter can index and searchable. Otherwise, these pdf files can not be searched.
Does above solution work if PDFs are print/copy protected?
You can open PDF File in any PDF reader such as Acrobat Reader, Foxit,.. Then try Ctrl+F to search your text within PDF file, If it’s searchable, that mean your PDF file can be indexed and searched with this solution.
Hi – for me wrongs icons are getting displayed if PDFs are inside folder in doc lib. Any idea on this?
You need to check for DocIcon.xml under the path C:\Program Files\Common Files\Microsoft Shared\Web Server extensions\14\Template\Xml
Then find the key to see if it’s correctly for icon name