Get Crawl results from Simple Scraper API Pull

Jillian_Macgowan · June 12, 2021, 2:53pm

Hi,

I am trying to integrate a simple scraper recipe with parabola - teh simple scraper recipe is a crawl with a set of urls added into the recipe. when I run the api from Parabola, the recipe runs only 1 url line scrape and does not scrape all the urls configured in the “Crawl” tab - how can I trigger the crawl for the scrape instead of just getting the response data back for the first url?

Has anyone tried simplescrape crawl using “pull from api” option in Parabola - any inputs will be highly appreciated.

Thanks
Jillian

Brian_Dawson · June 13, 2021, 9:58pm

Hey Jillian,

If you’re trying to use the simplescrape API on multiple URLs within one flow, I think you would want to use the “Enrich with API” step instead of the “Pull from API” step.

First, go to your dashboard of the recipe you’ve created in simplescraper. Grab the API url that you see under the API tab. That’s what you put in the main part of the Enrich API step.

On the bottom of that API tab in simplescraper is a url parameter for feeding different urls into the endpoint. You’ll add that as a parameter in the enrich API step in Parabola.

You need a step in Parabola before the enrich step. That would be something like a Google sheet where one column is a list of the URLs you want to scrape.

I’ll try to follow up with more detail if I can in a bit, but hopefully this gets you started.

Brian_Dawson · June 14, 2021, 1:35pm

To follow up, this is how the Enrich step should look in Parabola:

This is where you get that “API Endpoint URL” in Simplescraper:

Then, just connect a previous step, that has a column which is a list of URLs you want to scrape, and use that column as the value for the source_url parameter.

Jillian_Macgowan · June 14, 2021, 2:25pm

Hi,

Thank you for sharing this, the “Crawl” tab in simplescraper already has the list of urls configured for my recipe - is it possible to pull those?

Regards,
Jillian

Brian_Dawson · June 14, 2021, 3:49pm

Hi Jillian,

I’m not sure if it’s technically possible. You would either need to scrape that crawl tab itself (I don’t think this is possible, but maybe a tool that isn’t simplescraper could do it) or Simplescraper would need to provide that data via an API (which I don’t think they do).

I think your best options are:

Copy those URLs from the “Crawl” tab and put them in a Google sheet which you can then connect as a step before your Enrich step.

or

Run the scrape within Simplescraper itself. They have an integration for pushing the data straight into a Google Sheet.

Of course, it depends on exactly what you’re trying to do, but hopefully one of those options works for you.

Jillian_Macgowan · June 14, 2021, 5:54pm

Hi Brian,

Thank you for your advice. Yes I was able to do option 2 as a workaround and created this flow, wanted to check if Parabola could do it as a single api pull from simplescraper instead of putting google sheets in the middle.

I was able to get this flow working by doing this:

Simplescraper Crawl (crawls the list of urls) → Integrates with Google sheets and writes the data to Google Sheets → Parabola (pull data from google sheets and data cleanup) → Write data to Airtable (table).

Regards,
Jillian

Brian_Dawson · June 14, 2021, 7:06pm

How are you getting the list of URLs into the Crawl section of Simplescraper? Does that list exist somewhere else before you put it in Simplescraper?

Jillian_Macgowan · June 14, 2021, 7:25pm

Hi Brian,

No the list does not exist, I add the wanted list of urls into a simple scraper and then get it to crawl. The list of urls may change in future.

Regards,
Jillian

Topic		Replies	Views
Limit API Call Parameter to Only First Request Ask a question	1	436	October 16, 2021
Create a Rest API end point Ask a question Using-APIs , Building-Flows	3	837	September 27, 2022
Enrich API and multiple results? Ask a question Using-APIs , Building-Flows	5	601	March 29, 2022
Api connection with WebScraper Ask a question Using-APIs	1	620	May 13, 2020
"Can Parabola connect to this API" Flowchart FAQ API	3	1179	May 24, 2022

Get Crawl results from Simple Scraper API Pull

Related topics