Hi,
I am trying to integrate a simple scraper recipe with parabola - teh simple scraper recipe is a crawl with a set of urls added into the recipe. when I run the api from Parabola, the recipe runs only 1 url line scrape and does not scrape all the urls configured in the “Crawl” tab - how can I trigger the crawl for the scrape instead of just getting the response data back for the first url?
Has anyone tried simplescrape crawl using “pull from api” option in Parabola - any inputs will be highly appreciated.
Thanks
Jillian
Hey Jillian,
If you’re trying to use the simplescrape API on multiple URLs within one flow, I think you would want to use the “Enrich with API” step instead of the “Pull from API” step.
First, go to your dashboard of the recipe you’ve created in simplescraper. Grab the API url that you see under the API tab. That’s what you put in the main part of the Enrich API step.
On the bottom of that API tab in simplescraper is a url parameter for feeding different urls into the endpoint. You’ll add that as a parameter in the enrich API step in Parabola.
You need a step in Parabola before the enrich step. That would be something like a Google sheet where one column is a list of the URLs you want to scrape.
I’ll try to follow up with more detail if I can in a bit, but hopefully this gets you started.
To follow up, this is how the Enrich step should look in Parabola:
This is where you get that “API Endpoint URL” in Simplescraper:
Then, just connect a previous step, that has a column which is a list of URLs you want to scrape, and use that column as the value for the source_url parameter.
Hi,
Thank you for sharing this, the “Crawl” tab in simplescraper already has the list of urls configured for my recipe - is it possible to pull those?
Regards,
Jillian
Hi Jillian,
I’m not sure if it’s technically possible. You would either need to scrape that crawl tab itself (I don’t think this is possible, but maybe a tool that isn’t simplescraper could do it) or Simplescraper would need to provide that data via an API (which I don’t think they do).
I think your best options are:
- Copy those URLs from the “Crawl” tab and put them in a Google sheet which you can then connect as a step before your Enrich step.
or
- Run the scrape within Simplescraper itself. They have an integration for pushing the data straight into a Google Sheet.
Of course, it depends on exactly what you’re trying to do, but hopefully one of those options works for you.
Hi Brian,
Thank you for your advice. Yes I was able to do option 2 as a workaround and created this flow, wanted to check if Parabola could do it as a single api pull from simplescraper instead of putting google sheets in the middle.
I was able to get this flow working by doing this:
Simplescraper Crawl (crawls the list of urls) → Integrates with Google sheets and writes the data to Google Sheets → Parabola (pull data from google sheets and data cleanup) → Write data to Airtable (table).
Regards,
Jillian
How are you getting the list of URLs into the Crawl section of Simplescraper? Does that list exist somewhere else before you put it in Simplescraper?
Hi Brian,
No the list does not exist, I add the wanted list of urls into a simple scraper and then get it to crawl. The list of urls may change in future.
Regards,
Jillian