Posts: 95
Threads: 25
Joined: May 2023
05-14-2023, 04:31 AM
(This post was last modified: 05-31-2023, 07:41 AM by Gintaras.)
Hi Dear Gintaras,
I was worried I will be kick off from the forum for asking too many "stupid" question.
I would like to ask if the HtmlDoc function will be added to LA or it is already in somewhere in LA? Because I was not able to translate code from qm2 to LA,(I am new to programming).Could you please help with this issue?
https://www.opera-arias.com/arias/
I was trying to extract the arias information to a csv file, there are 100+ pages and I was wondering if that is possible to write "for loop" code to automatically determine the "next" page button and then click after each cycle. Thank you so much for any help!
Posts: 12,073
Threads: 140
Joined: Dec 2002
05-14-2023, 07:34 AM
(This post was last modified: 05-20-2023, 04:05 PM by Gintaras.)
Quote:I was worried I will be kick off from the forum for asking too many
Your questions help in LA development. Testing LA with real tasks, etc.
Quote:Will HtmlDoc be included in LA?
Unlikely. Use other libraries, for example HtmlAgilityPack. Look in Cookbook -> Internet -> Parse HTML.
In this case can be used elm.
// script "Opera arias.cs"
//https://www.opera-arias.com/arias/#x
print.clear();
var csv = new csvTable { ColumnCount = 9 };
var w = wnd.find(1, "Opera Arias *- Google Chrome", "Chrome_WidgetWin_1");
for (; ; ) {
_Page();
//break;
var next = w.Elm["web:LINK", "Next"].Find(-1);
if (next == null) break;
next.WebInvoke();
}
print.it(csv);
void _Page() {
var table = w.Elm["web:GROUPING", prop: "@id=table_div"].Find(1);
//Some cells are empty, and there are no elms for empty cells, therefore cell indices become incorrect.
//Solution: at first get column x offsets from the header row. Then can skip empty cells.
var header = table.Navigate("pr");
var ax = header.Elm["LINK"].FindAll().Select(o => o.Rect.CenterX).ToArray();
for (var row = table.Navigate("fi"); row != null; row = row.Navigate("ne")) {
var cells = new string[csv.ColumnCount];
var cell = row.Navigate("fi ne");
for (int i = 0; i < csv.ColumnCount; i++) {
if (i > 0) { cell = cell.Navigate("ne"); if (cell == null) break; }
//correct column index for empty cells
for (int x = cell.Rect.left; x > ax[i] && x != 0;) i++;
var s = i switch { 1 => cell.HtmlAttribute("style")[6..^2], 6 => cell.Navigate("fi").Name, _ => cell.Name };
if (i == csv.ColumnCount - 1) { //the last column. Some cells consist of multiple elements.
while ((cell = cell.Navigate("ne")) != null) s += cell.Name;
}
cells[i] = s;
}
csv.AddRow(cells);
}
}
Posts: 95
Threads: 25
Joined: May 2023
Hi Dear Gintaras,
I have tried the code but got nothing to shown on the output dashboard. I have also tried to save the csv to desktop folder also got nothing. Please help me to figure it out? By the way, Does this code work for any cases similar table on webpage?(after modifying)
Thank you so much for the code. It seems hard for me to understand every step. "var s = i switch { 1 => cell.HtmlAttribute("style")[6..^2]" what is this for?
Posts: 12,073
Threads: 140
Joined: Dec 2002
The script runs maybe > 30 s. Then prints the CSV. Tested, never fails.
I tested in Chrome 113 + "uBlock Origin" extension.
If does not work, use print.it(...) to debug the script.
That line gets some more useful text than just cell.Name (I guess). If don't need it, replace the line with var s = cell.Name;.
Posts: 95
Threads: 25
Joined: May 2023
I am so sorry for the previous reply. It works very smoothly. I waited a few seconds without seeing any content I turned off the code. I didn't notice that the results will be displayed after all the code has finished running.(After the whole for loop). And I have also test with other similar website to extract table content. It can also work smoothly. I love love love LA.....!
Gintaras, I couldn't think of any words to express my gratitude!
Posts: 95
Threads: 25
Joined: May 2023
Hi Gintaras,
I just noticed that some cells is like this "Musetta/Alcindoro/Mimì/Rodolfo", but the code will just extract the first of them like "Musetta" instead of the whole group of text. Will that be possible to join them together as a whole cell? For this case, please see the page 1 row 25 the last cell.
Thank you so much!
Posts: 12,073
Threads: 140
Joined: Dec 2002
Fixed. Please use the updated code.
Posts: 95
Threads: 25
Joined: May 2023
Works great! No technical service can be faster and more efficient than you!Thanks
Posts: 95
Threads: 25
Joined: May 2023
Hi Gintaras,
I am still studying this code. The line I marked can be replaced by var cell = row.Navigate("ch2") . Why did you use ("fi ne")? I didn't understand it until I tried using ch2. Could you please explain why you use ”fi ne” ? Any difference between them?
Thank you so much!
Posts: 12,073
Threads: 140
Joined: Dec 2002
05-27-2023, 07:16 AM
(This post was last modified: 05-27-2023, 07:18 AM by Gintaras.)
No difference. It means "get the first child, then its first sibling". The result is the same as ch2, which means "get the second child".
Posts: 95
Threads: 25
Joined: May 2023
Posts: 95
Threads: 25
Joined: May 2023
Hi Gintaras, How to make loop to turn page if there is nothing like "next" button on page? Thanks!
https://notes-box.com/musicians/
Posts: 12,073
Threads: 140
Joined: Dec 2002
One of ways - list of link names.
// script "notes-box.cs"
//https://notes-box.com/musicians/a/
print.clear();
var w = wnd.find(0, "* Google Chrome", "Chrome_WidgetWin_1");
foreach (var v in "0-9,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z".Split(',')) {
var e = w.Elm["web:LINK", v].Find(5);
print.it(e);
e.Invoke();
e = w.Elm["web:GROUPING", "Artist* " + v].Find(5);
var links = e.Parent.Elm["LINK", prop: "level=0"].FindAll();
print.it(links.Length);
}
Posts: 12,073
Threads: 140
Joined: Dec 2002
05-31-2023, 04:25 AM
(This post was last modified: 05-31-2023, 04:38 AM by Gintaras.)
// script "notes-box.cs"
//https://notes-box.com/musicians/a/
print.clear();
var w = wnd.find(0, "* Google Chrome", "Chrome_WidgetWin_1");
foreach (var v in "0-9,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z".Split(',')) {
//foreach (var v in "A".Split(',')) { //test 1 page
//open page
var e = w.Elm["web:LINK", v].Find(5);
print.it(e.Name);
e.Invoke();
e = w.Elm["web:GROUPING", "Artist* " + v].Find(5);
//get all artists
//show all
var all = w.Elm["web:LINK", "All"].Find(-1);
if (all != null) {
all.Invoke();
e = w.Elm["web:LINK", "Paged"].Find(10);
}
var links = e.Parent.Elm["LINK", prop: "level=0"].FindAll();
print.it(links.Length);
}
Posts: 95
Threads: 25
Joined: May 2023
Thank you.This is so creative.I found that there are always more solutions than problems.And there are many ways to do it.It's amazing!
Posts: 95
Threads: 25
Joined: May 2023
Hi Gintara, Sorry to bother you again. There is no button number on this page https://filecr.com/ms-windows/?id=685550968000. All button have the same "pagination". I tried to use the method you had used before but failed. Any solution to this issue? I very appreciate your help.
var w = wnd.find(1, "* - Google Chrome", "Chrome_WidgetWin_1");
for (var e = w.Elm["web:BUTTON", "pagination"].Find(); e != null; e = e.Navigate("ne")) {
e.Invoke();
2.s();
}
Posts: 12,073
Threads: 140
Joined: Dec 2002
06-01-2023, 07:23 AM
(This post was last modified: 06-01-2023, 08:33 AM by Gintaras.)
Try Selenium. It is more reliable for web browser automation. For example can reliably wait until web page loaded. But not so easy to use, and will need to install 2 NuGet packages and update them for each new Chrome version.
// script ""
/*/ nuget selenium\Selenium.Support; nuget selenium\Selenium.WebDriver.ChromeDriver; /*/
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Interactions;
using OpenQA.Selenium.Support.Extensions;
using OpenQA.Selenium.Support.UI;
script.setup(trayIcon: true, sleepExit: true, exitKey: KKey.MediaStop, pauseKey: KKey.MediaPlayPause);
print.clear();
//Starts new Chrome instance.
ChromeOptions options = new();
//Enable and maybe edit this if want to use an existing profile. To get profile path, in Chrome open URL "chrome://version/".
// Then before starting this script also may need to close existing Chrome instances that use this profile.
//options.AddArguments($"user-data-dir={folders.LocalAppData + @"Google\Chrome\User Data"}", "profile-directory=Profile 1");
var service = ChromeDriverService.CreateDefaultService();
service.HideCommandPromptWindow = true;
using var driver = new ChromeDriver(service, options);
driver.Manage().Window.Maximize();
for (int i = 1; i <= 5; i++) {
script.pause();
driver.Navigate().GoToUrl($"https://filecr.com/ms-windows/?page={i}"); //opens and waits until loaded
1.s();
}
1.s();
dialog.show("Close web browser", x: ^1);
The same way can be used without Selenium, but I cannot find a reliable way to wait until web page loaded, therefore will need to add delays and it makes the script slower.
Posts: 12,073
Threads: 140
Joined: Dec 2002
This script uses an existing Chrome instance which must be started with a special command line.
// script ""
/*/ nuget selenium\Selenium.Support; nuget selenium\Selenium.WebDriver.ChromeDriver; /*/
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Interactions;
using OpenQA.Selenium.Support.Extensions;
using OpenQA.Selenium.Support.UI;
script.setup(trayIcon: true, sleepExit: true, exitKey: KKey.MediaStop, pauseKey: KKey.MediaPlayPause);
print.clear();
//Chrome must be launched with command line like this:
//run.it("chrome.exe", "--remote-debugging-port=9222");
////run.it("chrome.exe", $"--remote-debugging-port=9222 --user-data-dir=\"{folders.LocalAppData + @"Google\Chrome\User Data"}\"");
ChromeOptions options = new() { DebuggerAddress = "127.0.0.1:9222" };
var service = ChromeDriverService.CreateDefaultService();
service.HideCommandPromptWindow = true;
using var driver = new ChromeDriver(service, options);
driver.Manage().Window.Maximize();
for (int i = 1; i <= 2; i++) {
script.pause();
driver.Navigate().GoToUrl($"https://filecr.com/ms-windows/?page={i}"); //opens and waits until loaded
1.s();
}
Posts: 95
Threads: 25
Joined: May 2023
Thank you so much.It’s so powerful to combine LA with these packages.
|