06-09-2023, 12:57 PM
Victor-P, thanks for steering me back into web scraping after a couple of years of neglect! Gintaras, thank you for LA! It seems to me that this can do quite a bit of RPA-ish stuff.
I wanted to post this PuppeteerSharp code that scrapes Hacker News for links, uses 'print.it' to show them, and then kills the browser. (I noticed that 'await page.CloseAsync()' still leaves that 'about:blank' page open, so I took the route below. I still need to figure out how to get rid of that 'about:blank' page when the browser starts, but I'm sure a little googling and fiddling will get me there in short order.
Regards,
burque505
I wanted to post this PuppeteerSharp code that scrapes Hacker News for links, uses 'print.it' to show them, and then kills the browser. (I noticed that 'await page.CloseAsync()' still leaves that 'about:blank' page open, so I took the route below. I still need to figure out how to get rid of that 'about:blank' page when the browser starts, but I'm sure a little googling and fiddling will get me there in short order.
/*/ nuget -\PuppeteerSharp; /*/ //.
using PuppeteerSharp;
script.setup(trayIcon: true, sleepExit: true);
//..
using var browserFetcher = new BrowserFetcher();
await browserFetcher.DownloadAsync(BrowserFetcher.DefaultChromiumRevision);
var browser = await Puppeteer.LaunchAsync(new LaunchOptions {
Headless = false,
UserDataDir = @"E:\LibreAutomate\UserData" // Pick your own data dir!
});
var page = await browser.NewPageAsync();
{
await page.GoToAsync("https://news.ycombinator.com/");
print.it("Get all urls from page");
var jsCode = @"() => {
var arr = [], l = document.links;
for(var i=0; i<l.length; i++) {
arr.push(l[i].href);
}
return arr;
}";
var results = await page.EvaluateFunctionAsync(jsCode);
foreach (var result in results)
{
print.it(result.ToString());
}
print.it("Finished.");
}
browser.Disconnect();
await browser.CloseAsync();
Regards,
burque505