Traditionally websites have consisted of HTML-based web pages. They may also reference various document types, such as PDFs as well as Word and Powerpoint documents. These document types contain textual content as well as images that may be crucial for your business. Thus finding the content is of great importance.
In this article, we’ll review information on AddSearch’s document types feature. First, we will describe some use cases to give you an idea where finding PDFs and office documents are important. Then we’ll look at how you can set up document types, what we crawl and index from the documents and how you can filter search results based on the document types.
Our clients render PDFs and Microsoft Office documents searchable for many reasons. Here are some popular examples:
- Governments and municipalities produce public documents, most of which come in a digital form. As public organizations, their decision making needs to be transparent and the document types need to be easily discoverable.
- E-commerce stores have technical specifications and promotion documents from the manufacturer. When these are rendered, customers can search these easily.
- Some companies provide products where blueprints with careful measurements of the products are needed. The blueprints may, for instance, be parts of furniture, nuts, and bolts. Commonly blueprints are provided as PDFs.
- Educational organizations and learning marketplaces organize courses where the materials may include PDFs, PowerPoint presentations as well as web pages.
In all of these cases, making PDFs and Microsoft Office documents searchable gives better user experience and makes the information better accessible. All of the use cases may also benefit a mix of search results that pin together of product web pages and document types. Pinning search results can be done using the pinned results feature. Read more about the pinned results from our documentation and earlier blog post.
Document types indexing is available to free trial users and enterprise customers. For the Small and Large subscription plans the support is available with the purchase of the Plus package add-on.
Setting up document types
You can set up document types feature by following these instructions or taking the following steps
- Login to your AddSearch account
- Navigate to:
- Domains and crawling
- Under Document types (PDF, Word, PowerPoint) enable PDF support
When this setting is changed, a full re-crawl is required.
- Navigate to Index tools
- Initiate full re-crawl
What is indexed and shown in the search results?
In addition to the content, AddSearch indexes the metadata from PDFs and Microsoft Office documents. There are settings we can use to enhance what is indexed as well as what is shown in the search results. Please contact AddSearch Customer Support if yo￼u need help in setting up the search.
Filtering search results based on document types
You can filter the search results based on document types with category filters. Below you can see the supported category filters for document types
The category filter has the following syntax
categories=doctype_pdf which you can add to the search script.
The following scripts exemplify how to use the document types as filters in the search Widget script. Replace
#### with your site key.
Include search results with PDF as the document type
Include search results with pptx as the document type
You can also create combinations of document types to include in the search results. For instance, the following filter includes search results with PDF, pptx and doc as the document type
<script src="https://addsearch.com/js/?key=####&categories=doctype_pdf, doctype_pptx, doctype_doc"></script>
For more information on category filters visit the documentation.
The use cases show why it is important to make different document types searchable. The reason for this is that websites have important content that may come in different document types. Regardless of the document type, it is important that your users find exactly the content they need easily.