Hi, I will be showing you guys how to extract the transcript of a Youtube video, using the Code Piece.
Plese, take into consideration this is a pretty rustic experiment.
1st step: Obtain the Youtube video URL (not id, or anything like that)
2nd step: Create your Code Piece and configure it like this:
3rd Step: Press the Full Screen button:
4th Step: Set your package.json to:
{
"dependencies": {
"srt": "0.0.3",
"yt-dlp-wrap": "2.3.12",
"node-html-parser": "6.1.12"
}
}
5th Step: Set your index.ts file to:
import YTDlpWrap from 'yt-dlp-wrap';
import { promises as fs } from 'fs';
import srt from "srt";
import { parse } from 'node-html-parser';
export const code = async (inputs) => {
await YTDlpWrap.downloadFromGithub();
const ytDlpWrap = new YTDlpWrap('./yt-dlp');
await ytDlpWrap.execPromise([
inputs.videourl,
'--skip-download',
'--write-auto-subs',
'--sub-lan',
'en',
'--sub-format',
'ttml',
'--convert-subs',
'srt',
'--exec',
`
before_dl:"sed -e '/^[0-9][0-9]:[0-9][0-9]:[0-9][0-9].[0-9][0-9][0-9] --> [0-9][0-9]:[0-9][0-9]:[0-9][0-9].[0-9][0-9][0-9]$/d' -e '/^[[:digit:]]\{1,3\}$/d' -e 's/<[^>]*>//g' -e '/^[[:space:]]*$/d' -i '' %(requested_subtitles.:.filepath)#q"
`,
]);
const path = ".";
const files = await fs.readdir(path);
let toReturn = "";
const readFileAndProcess = async (file) => {
if (file.endsWith(".srt")) {
try {
const data = await fs.readFile(file, 'utf-8');
const jsonObj = await srt.fromString(data);
for (const objeto in jsonObj) {
const aux_text = jsonObj[objeto]['text'];
const tag = parse(aux_text);
toReturn += tag.text + " ";
console.log(tag.text);
}
} catch (error) {
console.error("Error processing file:", file, error);
}
}
};
for (const file of files) {
await readFileAndProcess(file);
}
return toReturn;
};
Disclaimer: Use this on your own risk.
PS: Maybe Iâm missing the âdelete fileâ part, so if you download multiple transcripts, it wonât work as expected (I think).