Creating this workflow in active pieces

NFT_Seerius · September 14, 2023, 9:56am

Hello, activepieces community!
I am trying to optimize some of my workflow. And I wanted to know if doing it in active pieces is possible.

I provide a list of URLS (could be pulled from my airtable)
Copy paste the content of a home screen content of an entire home page
Summarize using a chatgpt prompt
Inputing outcome into an airtable. Next to the URL

I would really love to know if this workflow would be possible to execute, if so, I would be right at it!

Thank you so much for any help and the platform to share this question!

Preben · September 14, 2023, 11:53am

As far as I know Active Pieces can’t read a website (yet). (I might be wrong)

But you can use this to do what you want:

It can easily read the content of a URL, paste it in a cell, and also summarize the content.

You can also add your own prompts to do more with the content.

thisthatjosh · September 14, 2023, 4:18pm

there is an http GET function. We have also made a code based GET script.

Preben · September 15, 2023, 9:05am

Ah, I wasn’t aware I could use it this way @thisthatjosh . I was wrong! Cool!

How can we tell it to only retrieve the content from the site, and not all the excess HTML? Like not any CSS, not any info from etc

I made a test flow, and the content is too long. I get the


> message": "This model's maximum context length is 16385 tokens. However, your messages resulted in 22339 tokens. Please reduce the length of the messages.",

NFT_Seerius · September 15, 2023, 2:05pm

Thank you very much Preben!

NFT_Seerius · September 15, 2023, 2:06pm

Thank you very much Josh!

GunnerJnr · September 15, 2023, 3:22pm

Hi @Preben,

You have 2 options here; both involve utilising the codepiece!

Install some npm package, such as cheerio. npm install cheerio. Then, you can utilise it to extract a specific section from the page data gathered in the GET request.

For Example:

If the web page’s main content has a div with the id of #main-content for example then you could do something like the following:

export const code = async (params) => {
    // Import the 'cheerio' library
    const cheerio = require('cheerio');

    // Loading the HTML data fetched (assuming it is stored in params.htmlString in your Key:Value) into Cheerio to facilitate the parsing and manipulation of the HTML structure.
    // The '$' is a reference to the cheerio instance, loaded with the HTML data, ready to query the DOM elements just like in jQuery.
    const $ = cheerio.load(params.htmlString);

    // Selecting the element with the ID 'main-content' from the loaded HTML data using the '$' function, which is similar to the jQuery selector function.
    // The '.html()' function is then called to get the inner HTML of the selected element as a string.
    const mainContent = $('#main-content').html();

    // Returning the inner HTML of the 'main-content' section as a string.
    // If the '#main-content' element is not found in the HTML data, 'null' will be returned.
    return mainContent;
};

or if you prefer plain JavaScript, then you MIGHT be able to use the DOM API to extract the content, with something like the following:

export const code = async (params) => {
    // We start by creating a new DOMParser object.
    const parser = new DOMParser();

    // Next, we use our DOMParser tool to read the HTML data (which is stored in params.htmlString (Your Key:Value pair) and turn it into a format (a Document object) that allows us to easily find and work with different parts of the webpage.
    const doc = parser.parseFromString(params.htmlString, 'text/html');

    // Now, we ask our Document object (which represents the webpage) to find the section that has an ID of 'main-content'.
    const mainContent = doc.querySelector('#main-content');

    // Here, we check if we successfully found the 'main-content' section. If we find it, we take all the content from that section. If not, we say 'null', which means we didn't find anything.
    const mainContentHTML = mainContent ? mainContent.innerHTML : null;

    // Finally, we give back the content we found (or 'null' if we didn't find anything) so it can be used later.
    return mainContentHTML;
};

I haven’t tested so you might need to fiddle around with it, and obviously you will need to change the div or html tag to the class or id that you want to target. I did something similar to extract the meta description from the page:

export const code = async (params) => {
    // Create a new DOMParser object, a tool that helps us to read and understand the structure of a webpage.
    const parser = new DOMParser();

    // Use the DOMParser to read the HTML data (stored in params.htmlString) and convert it.
    const doc = parser.parseFromString(params.htmlString, 'text/html');

    // Find the meta element with the name attribute set to "description" to get the meta description.
    const metaDescriptionElement = doc.querySelector('meta[name="description"]');

    // Get the content attribute of the meta description element to obtain the actual description text. If the element doesn't exist, we return null, indicating that no description was found.
    const metaDescription = metaDescriptionElement ? metaDescriptionElement.getAttribute('content') : null;

    // Return the meta description text (or null if no description was found) so it can be used later.
    return metaDescription;
};

Hope this helps.

Kind Regards

ashrafsam · September 15, 2023, 3:34pm

Thanks @GunnerJnr You might want to post that on @Preben’s separate topic here: Get the contents from a URL - Which settings should I use? - #3 by ashrafsam

GunnerJnr · September 15, 2023, 3:42pm

Sorry, @ashrafsam,

I didn’t realise there was one; I will copy it across.

Thanks.

system · September 16, 2023, 3:42pm

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.