Data Acquisition & Enrichment

XSB Focused Crawlers

The collection of rich product information is an important aspect of supply chain management, but it is also the major cost-driver.  Technical data about parts and logistics information such as price, availability and lead time are often available on manufacturer or distributor web sites.  This information can be used to enrich part characteristics and help buyers and engineers efficiently find and compare items from multiple sources. Many organizations rely on manual data collection, traditional “ web crawlers” or search engines to aggregate product data; these methods have significant drawbacks.

How does Focused Crawler Technology work?
XSB’s patent pending Focused Crawler technology is a next generation automated web data acquisition system.  Focused Crawlers enable the automated collection of precise product specific data from web based catalogs.  Unlike traditional web crawlers, Focused Crawlers do not rely on sitemaps, making them resilient to site changes and enabling them to scale to collect product information from hundreds or thousands of web sites.  Importantly, they are able to do this without human intervention, making them a low cost alternative to manual data acquisition strategies.

Focused Crawlers work by analyzing the structure of a web site; they can identify which pages are product pages and can also recognize pages containing lists of products. They then compare similar product pages to one another to learn what important properties are described for a product and how to differentiate it from the other information on the page.  Once this is done, only relevant product information is extracted, the rest of the data on the page, the “noise” as we call it, is ignored. 

What are the benefits of Focused Crawlers?

  • Focused Crawlers do not rely on sitemaps, making them resilient to site changes and enabling them to easily scale to collect product information from hundreds or thousands of web sites with little manual intervention, lowering the cost of ownership for an organization.
  • Focused Crawlers are designed specifically to identify and extract only product data from web pages, making them highly precise.   They are able to identify multiple products on the same page or when information about the same product is stretched across multiple pages. This ability reduces time to market for organizations through improved access to component part data.

How can I learn more?
For additional information about Focused Crawler please contact us or call 631-371-8100.