Skip to content

Commit f2571d0

Browse files
v0.3
1 parent 18e965f commit f2571d0

File tree

5 files changed

+167
-5
lines changed

5 files changed

+167
-5
lines changed

README.md

Lines changed: 100 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,106 @@
1-
# Horseman
1+
# 👋
22

3-
This repository is for tracking issues and feature requests for the Horseman crawler.
3+
This repository is for detailed update notes, and for tracking issues and feature requests for the [Horseman](https://gethorseman.app) crawler.
44

55
https://github.com/workeffortwaste/horseman/issues
66

7-
👀
7+
# Horseman v0.3
88

9-
## Current Version
9+
**tl;dr** 🕷️ *Crawl the web with GPT3.5 and use page content with prompts. 🤖Don't know JS? Create snippets with an AI helper instead. 🌌 Deeper exploration with new the Insights feature. 🤯 Huge number of new snippets and much more.*
1010

11-
The latest version is v0.2.4 for Windows, MacOS (M1 and Intel), and Linux.
11+
## Patch Notes
12+
13+
Let's get the obligatory patch notes out the way. A lot of bugs have been squashed, elements polished, and the edges removed from many cases.
14+
15+
## New Features
16+
17+
With that out the way let’s talk about the big new features in Horseman v0.3.
18+
19+
### OpenAI GPT 3.5 Integration
20+
21+
GPT3.5 has been integrated directly in Horseman in a couple of ways. 👇
22+
23+
### Prompts In Snippets
24+
25+
****************Requires an OpenAI API key (add yours under the main Settings menu).****************
26+
27+
Query GPT with any prompt by returning the new `prompt` property from your snippet. Combine any piece of page data, or send the entire page to GPT for analysis.
28+
29+
Example snippets using page content have been added to the *New* button in the Editor. You’ll also find newly created built-in snippets in the library to rewrite meta descriptions, write missing meta descriptions, and summarize page content.
30+
31+
```jsx
32+
/* horseman-config enable-openai */
33+
34+
/* Generate a beautiful poem from the meta description using GPT */
35+
36+
/* Fetch the meta description */
37+
const meta = document.querySelector('meta[name="description"]')?.getAttribute('content')
38+
39+
/* Skip the snippet if the page doesn't have a meta description */
40+
if (!meta) return
41+
42+
/* Ask gpt-3.5-turbo to generate a poem */
43+
return {
44+
prompt: `Rewrite the following as a short 160 character poem: ${meta}`
45+
}
46+
```
47+
48+
Further basic examples can be found in the `examples` folder in this repository.
49+
50+
### No-Code AI Helper
51+
52+
****************************Does not require an OpenAI API key.****************************
53+
54+
Don’t know JavaScript? That’s no longer an issue. You’ll find a new button in the Editor, *No-Code AI*, which will use the power of GPT to automatically create simple Horseman snippets from a description.
55+
56+
Entering a brief description such as `the social meta image` will instantly create a snippet fetching and returning the image used for social shares in the open graph data.
57+
58+
## Insights
59+
60+
A fully explorable report showing statistics about the number of failing or passing pages per snippet. *Think of it as your own customisable audit.*
61+
62+
All built in snippets have been updated to support this feature (where it makes sense) and your own snippets can use this feature with ease. Any snippet returning a `pass` / `fail` string (or the new State property) can enable Insights with the new configuration helper or the `enable-insights` directive.
63+
64+
## States
65+
66+
You are now no longer limited to defining a snippet as passing by only returning a `pass` / `fail` string, with the new state property of the return object you can show the state alongside the data in the results table.
67+
68+
```jsx
69+
/* horseman-config enable-state */
70+
71+
/* Using the new state property to send a fail state if num is more than 1 */
72+
const num = 2
73+
74+
return {
75+
cell: num
76+
state: num > 1 ? 'fail' : 'pass'
77+
}
78+
```
79+
80+
## Deeper Explorations
81+
82+
Explore additional data more easily by viewing it per snippet instead of just by page. Hover over the column header for a snippet with explore enabled to find the new option.
83+
84+
What’s more, when in the additional table, hovering over the column headers you’ll then be able to filter the results by unique values.
85+
86+
## Configuration Helper
87+
88+
As the features of Horseman have grown trying to remember all the snippet directives can be a little tricky. A new configuration helper will toggle your `horseman-config` directives with no fuss.
89+
90+
## Updated Chrome
91+
92+
The version of Chromium used for crawling has been updated to v111 for a faster and more robust crawl. *My favourite new addition?* You can now use the `:has` selector in your snippets for more advanced element selection!
93+
94+
## More Data Types
95+
96+
Automatically split up CSV strings into tags with the new `data-type-tag`.
97+
98+
Visualise colours directly in your crawl with `data-type-color` return any valid CSS colour value and render it as a coloured tag in the results table.
99+
100+
## New Snippets
101+
102+
*Over 40 new snippets have been added to the library!* Including ecommerce snippets which extract product information straight from the DataLayer, AI enhanced snippets to rewrite and analyse content, cookie consent platform discovery, and much more.
103+
104+
## Updated Website
105+
106+
I've given the website a small refresh as the tool get much closer to v1.0, check it out and download the latest version now. https://gethorseman.app/
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
/* horseman-config enable-openai, enable-external */
2+
3+
/**
4+
* This is an example of intelligently extracting the text content from a page
5+
* (with readability.js) and using it with a prompt.
6+
*/
7+
8+
/* Import the readability content extraction package from skypsack */
9+
const pkg = (await import('https://cdn.skypack.dev/@mozilla/readability')).default
10+
11+
/* Create a clone of the document for readability to modify */
12+
const documentClone = document.cloneNode(true);
13+
14+
/* Remove any elements from the clone that we definitely don't want to extract content from */
15+
['header', 'footer', 'nav'].forEach(selector => {
16+
documentClone.querySelectorAll(selector).forEach(element => { element.remove() })
17+
})
18+
19+
/* Extract the page content from the clone using readability */
20+
const content = (new pkg.Readability(documentClone).parse()).textContent
21+
22+
/* Ask gpt-3.5-turbo to summarize the content */
23+
return {
24+
prompt: \`Summarize the following content: \${ content }\`
25+
}

examples/gpt__prompt__raw_html.js

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
/* horseman-config enable-openai */
2+
3+
/**
4+
* This is a simple example of using the raw HTML content with a prompt.
5+
*/
6+
7+
/* Fetch the raw HTML */
8+
const html = document.documentElement.outerHTML
9+
10+
/* Ask gpt-3.5-turbo to generate a new description */
11+
return {
12+
prompt: \`Do something with the following HTML: \${ html } \`
13+
}

examples/gpt__prompt__raw_text.js

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
/* horseman-config enable-openai */
2+
3+
/**
4+
* This is a simple example of using the raw text content with a prompt.
5+
*/
6+
7+
/* Fetch the raw text */
8+
const html = document.body.innerText
9+
10+
/* Ask gpt-3.5-turbo to generate a new description */
11+
return {
12+
prompt: \`Do something with the following text: \${ html } \`
13+
}
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
/* horseman-config enable-openai */
2+
3+
/**
4+
* This is a simple example of using the content from a single HTML element with a prompt.
5+
*/
6+
7+
/* Fetch the meta description */
8+
const meta = document.querySelector('meta[name="description"]')?.getAttribute('content')
9+
10+
/* Skip the snippet if the page doesn't have a meta description */
11+
if (!meta) return
12+
13+
/* Ask gpt-3.5-turbo to generate a new description */
14+
return {
15+
prompt: \`Rewrite the following as a short 160 character poem: \${ meta } \`
16+
}

0 commit comments

Comments
 (0)