Open Source Issue Tracking Scraper
IssueScraper is a generic tool that's used to pull issue tracking information from external sites. It uses web page scraping technology, so most issue tracking systems can be supported. Out of the box, it supports Jira, Google and SourceForge issues trackers. It works by defining certain URLs that describe queries into the issue tracking systems. These URLs are used to fetch content that is then parsed to obtain the issue information.
Each project that you want to track needs to have a scraper defined. The scraper can then be configured to retrieve certain named queries. For example, you can setup queries that will return all open issues, recently opened issues, etc. You can define any query the external tracking system supports.
Processing Pipeline and Content Intercepting
To facilitate different types of content a generic parsing pipeline is used that supports injecting interceptors at any point in the processing pipeline. There are three stages of processing. First it retrieves a stream from the external site. This gets converted into a String object, which then gets parsed into a DOM object. Interceptors can be injected after the stream is received, after it's been converted into a String or after it's been parsed into a DOM.
Note
It is expected that the content be parsable into a DOM object after the String stage. This can be accomplished by
inserting the JTidyInterceptor at the stream stage.
Code Examples
IIssue issue = null;
List<IIssue> issues = null;
// This uses springframework to wire up the system, so get the BeanFactory
BeanFactory beanFactory = new XmlBeanFactory(new ClassPathResource("issue-scraper-beans.xml"));
// Get the issue tracker manager
IIssueTrackerManager manager = (IIssueTrackerManager)beanFactory.getBean("issueTrackerManager", IIssueTrackerManager.class);
// Retrives an issue by URL
// The url must "belong" to a defined project in the manager.
// An url belongs to a project if it matches both issueUrlExpression and issueIdExpression regular expressions
issue = manager.getIssue("http://www.ja-sig.org/issues/browse/UP-1", true);
// Retreives an issue by project and issue ID
issue = manager.getIssue("uPortal", "UP-1", true);
// Fetch all open issues
// The open named query is defined in the bean definition file for the "uPortal" scraper.
issues = manager.getIssuesFromNamedQuery("uPortal", "open");
// Fetch all recent issues
// The recent named query is defined in the bean definition file for the "uPortal" scraper.
issues = manager.getIssuesFromNamedQuery("uPortal", "recent");
Spring Configuration
The test bean configuration file describes the various configuration options and gives three examples for Jira, Google and SourceForge issue tracking systems.
For the three currently supported systems (Jira, Google and SourceForge), you will not need to modify the parser configurations. You will need to configure one DefaultProjectIssueScraper for each project you want to track. Then wire them up to and access them through an IssueTrackerManager instance.
| Attachment | Size |
|---|---|
| issueScraper.jar | 24.91 KB |
| issue-scraper-beans.xml | 36.44 KB |
| issue-scraper-test-beans.xml | 30.02 KB |
