API Reference &
documentation

Developers ahoy! With our REST API you can customize
the AddSearch experience to meet your wildest dreams.


AddSearch REST API

AddSearch’s REST API provides programmatic access to use and manipulate and query data in your search index. Current version of our API is v1. We’re expanding API’s functionalities based on your feedback, so feel free to contact us if any ideas arise.

Base URL

All API URLs start with the following base URL:
https://api.addsearch.com/v1
Access is always over HTTPS. All calls to HTTP return 405 Method Not Allowed

Content type

API endpoints consume and produce JSON:
application/json
Calls with JSON payload must include Content-Type header, which can be added in curl with the following switch:

curl -H 'Content-Type:application/json' https://api...

Authentication

Authentication is done with HTTP Basic Auth. Your index’s SITEKEY is the username and your secret API key is the password. You’ll find your SITEKEY and secret API key from the Dashboard’s Installation page. HTTP authentication in curl is done with the user switch:

curl --user 'sitekey:secret-api-key' https://api...

Please notice! The Search API does not require authentication.

Date Format

AddSearch API uses ISO-8601 standard as the date format. Example of an accepted timestamp is:
2015-01-30T11:17:22-02:00
Read more about ISO-8601 from w3.org

Rate limits

By default rate limits are monitored over 15 minutes time period. Every API call returns rate limit information in the following headers

X-Rate-Limit-Limit: The limit for a given request
X-Rate-Limit-Remaining: Requests left for the current 15 minute window
X-Rate-Limit-Reset: The time when the current usage count resets (seconds since Unix epoch)

Example headers returned by an API call:

X-Rate-Limit-Limit: 100
X-Rate-Limit-Remaining: 97
X-Rate-Limit-Reset: 1422615270

API Endpoints

Search

You can make queries to your AddSearch index with the Search endpoint
GET /search/{index public key}

Mandatory query parameters are:

  • term: Search term (aka keyword)

Optional query parameters are:

  • limit: Number of results to return per page (default: 10. Must be 1-50)
  • page: Page to return (default: 1)
  • jsonp: JavaScript function call wrapped around the response JSON
  • lang: Return results only with this language (e.g. “en” or “de”)
  • categories: Limit search to certain categories (domain or URL path). E.g. “0xdomain.com” would return results only from domain.com, 1xnews would return results from “domain.com/news/*” path
  • sort: relevance (default) or date
  • order: desc (default) or asc. Only applicable if “sort=date”
  • fuzzy: false (default) or true. Also match words that are close to the defined keywords. Off by default. Suggested way to use fuzzy search is first to search with it off, and then, if there were no exact results, try another request with it turned on.
  • dateFrom: return only results that are newer than given date, in yyyy-MM-dd format (example: 2018-12-15)
  • dateTo: return only results older than given date, in yyyy-MM-dd format
  • customField: return only results containing the given custom field and value pair, in “key=value” URL encoded format (so key%3Dvalue). You can define multiple custom field pairs by adding additional customField parameters to the query. If same custom field name is given with different values, results with any one value will be returned. If multiple custom field names are defined, results must match each criterion. Example: &customField=city%3Dlondon&customField=genre%3Drock&customField=genre%3Dpop (city=London AND (genre=rock OR genre=pop))
Please notice! Search API uses your public SITEKEY, not secret API key!

For example:
https://api.addsearch.com/v1/search/1bed1ffde465fddba2a53ad3ce69e6c2?term=rest+api
Returns

{
  page: 1,
  total_hits: 1,
  hits: [
    {
      id: "54f5b92d4e4766f4bc0ce2b05f80f58d",
      url: "https://www.addsearch.com/developers/api/",
      title: "AddSearch REST API",
      meta_description: "Documentation of our REST API",
      meta_categories: ["features", "api"], // <meta name="addsearch-category" content="features/api" />
      custom_fields: {
        location: "London",
        genre: ["Rock", "Pop"]
      }
      highlight: "AddSearch’s <em>REST API</em> provides programmatic access to your search index",
      ts: "2015-01-22T11:56:10",
      categories: [
        "0xwww.addsearch.com",
        "1xdevelopers",
        "2xapi"
      ],
      images: {
        main: "https://d20vwa69zln1wj.cloudfront.net/1bed1ffde465fddba...",
        capture: "https://d20vwa69zln1wj.cloudfront.net/1bed1ffde465fddba..."
      },
      score: 0.790107
    }
  ]
}

Fields in the returned JSON are:

  • page: Page number passed as a query parameter
  • total_hits: Total number of documents matching the search term

Elements in the hits array:

  • id: Document’s ID (md5 of the URL)
  • url: Document’s URL
  • title: Documents’s title
  • meta_description: Documents’s meta description
  • highlight: Part of the document’s content
  • ts: Document’s publishing date. if unknown, the time when the document was initially indexed
  • categories: Categories where the page belongs. These can be used to filter down the search query to a specific domain or path part
  • images.main: URL of the main image (e.g. og:image). Null if missing
  • images.capture: URL of the screen capture. Null if missing
  • score: How well the search term matches the document
  • custom_fields: custom fields defined for the document. Each value is either a string or a string array (if multiple values defined).

Please notice! The search API does not require authentication.

The rate limit for the Search API is 5 requests/sec from a single IP address. There are no limits based on the total search volume or the number of requests coming from different IP addresses. If you implement a “search-as-you-type” functionality, throttling with about 200ms delay between requests is recommended.

Get document’s status

You can get the status of a document with the following request. Doc id is the MD5 hash of a full URL with protocol and possible query parameters. For example the doc id of https://www.addsearch.com/ is 3b1d053e2fdf65f178dc5d1b5bd00f75

GET /indices/{index public key}/documents/{doc id}
API call returns following information:

{
  indexPublicKey: "index public key",
  docId: "md5 of url",
  status: "INDEXED|EXCLUDED|PENDING|ERROR|UNKNOWN",
  statusInfo: "Duplicate of another-doc-id",
  lastFetched: "2015-01-13T13:43:01.000Z",
  duplicateOf: {
    href: "https://api.addsearch.com/v1/indices/{index public key}/documents/{doc id}"
  },
  content: {
    href: "https://api.addsearch.com/v1/indices/{index public key}/documents/{doc id}/content"
  }
}

Get document’s contents

Following the link in the “content” property of document’s status response the indexed content of the document is returned.

GET /indices/{index public key}/documents/{doc id}/content

The response is of the following form:

{
  title: "An example page",
  h1: "The heading on an example page",
  h2: "",
  mainContent: "The indexed content on an example page",
  documentDate: "2015-02-10T14:11:13.000Z",
  language: "en",
  hiddenKeywords: null
}

The field “documentDate” is the ISO 8601 date that the document was created, if the information is available in the source document’s meta data, or if not, the date the document was initially indexed.

Hidden keywords is a space delimited list of manually defined keywords that when used as search keywords will match to this document, even  though they are not present in the documents indexed content.

Modify document’s hidden keywords

You can modify the hidden keywords of a document by POSTing the new value to this endpoint.

POST /indices/{index public key}/documents/{doc id}/content/hiddenKeywords

payload:

{
  hiddenKeywords: "list of space delimited hidden keywords"
}

The endpoint returns HTTP 200 OK if successful.

Add a page to index or re-crawl an URL

You can add new pages to your index or re-crawl existing documents with the following endpoint. Re-indexing is executed at the latest in a minute or two.

POST /crawler

payload:

{
  action: "FETCH",
  indexPublicKey: "SITEKEY",
  url: "http://foo.com/bar.html"
}

Returns HTTP 202 ACCEPTED with payload e.g.

{
  message: "Scheduled",
  docId: "doc id"
}