How to process a long html text with Open AI? [Divide into smaller batches because of token count]

Preben · September 25, 2024, 10:42am

I have some long HTML texts I want to add spintax to. GPT-4o-mini does this very well, but is limited to 4096 tokens.

So I would like a flow where I can input a long text (for instance 5000 words), and then have the flow chunk it up in batches of around 400-500 words.

Then loop through processing each of them with Open AI, and then put them together into a big article with spintax, keeping all html markup in tact.

How can I divide a huge text into smaller chunks, and still be able to not divide in the middel of a sentence?

And how to do a loop through process until last one is processed?