v1.0.9 #66
Replies: 3 comments
-
|
I'm eagerly looking forward to the release of the AppImage for the Linux x64 (Linux Mint user) v1.0.9 GUI version! Thank you for continuing to improve and provide this awesome website crawler / exporter. |
Beta Was this translation helpful? Give feedback.
-
|
Newbie question - what's the best / recommended way to update 1.08 to 1.09 under Linux Mint without trashing existing configs? Thanks. |
Beta Was this translation helpful? Give feedback.
-
|
Hey 👋🏼 @janreges. I spent some time today with this crawler command line and GUI and I just wanted to say - really nice work. Thank You! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
This release introduces a powerful new Website to Markdown converter, allowing you to export entire websites into clean, single or multiple Markdown files, which is ideal for AI context or documentation purposes. We've also added the ability to start crawling directly from a
sitemap.xmlfile and significantly enhanced the Offline Website Exporter with more granular control and better handling of international characters. Numerous new command-line options have been added for greater flexibility in crawling, filtering, and reporting, alongside many other improvements and bug fixes.New Features
html2markdown.--markdown-export-single-fileto combine all website content into a single, organized Markdown file, with smart removal of duplicate headers/footers.sitemap.xmlor sitemap index file directly to the--urlparameter to crawl all listed URLs.--resolveoption (likecurl) to provide custom IP addresses for specific domains and ports.--extra-columnsoption.--max-depthparameter for limiting how deep the crawler goes (for pages, not assets).--html-report-optionsto select which sections to include in the final HTML report.Improvements
--offline-export-remove-unwanted-codeoption to automatically strip analytics, cookie consents, and other non-essential scripts.--offline-export-no-auto-redirect-htmlflag to prevent the creation of meta-refresh redirect files.--transform-urlto internally change request URLs, useful for crawling sites that serve content from a different domain (e.g., a local instance).--max-non200-responses-per-basenameoption to prevent getting stuck in loops with dynamically generated error pages.--timezonefor all dates and times displayed in reports and used in exported filenames.This discussion was created from the release v1.0.9.
Beta Was this translation helpful? Give feedback.
All reactions