WebScraper uses the Integrity v8 engine to quickly scan a website, and can output extracted data (currently) as CSV or JSON. Plus download images to a folder.
- Easy to scan a site - just enter the starting URL and press "Go"
- Easy to export - choose the columns you want
- Plenty of extraction options, including HTML elements with certain classes or IDs, regular expressions, or entire content in a number of formats (html, plain text, markdown)
- Since v4.1 can download to a folder all images discovered
- Configuration of various limits on the crawl and the output file size
What's New:Version 4.7.0:
- Adds a new tab, 'Post process output file'. A couple of options have been shifted there, relating to the CSV file (splitting multiple values onto separate rows, and splitting the output file into 64k chunks).
- A new option added to the 'post process' tab; 'remove rows where this column is empty...'.
- A new option added to 'stop at X rows in the results' which is more relatable than the existing 'stop at X links' (which is a safety valve and is still present. That one should contain a number which is bigger than the number of links on the site that you're scanning. The default of 200,000 should be fine but add a zero if necessary.)
- If large csv file was being split into parts with max 64k rows, files after the first one wouldn't contain headings, they do now.
- Temp files are cleaned up when Webscraper quits normally.
- Title: WebScraper 4.7.0
- Developer: PeacockMedia
- Compatibility: OS X 10.8 or later, 64-bit processor
- Language: English
- Includes: K'ed by TNT
- Size: 6.11 MB