Login

birdywen · 07-21-2023, 01:31 PM

Hi Dear Gintaras,
I would like to ask if it is possible to use LA to extract just the python code from a html (convert from jupyter notebook) ? Thank you very much! Here is the html file Car price

burque505 · 07-22-2023, 03:47 PM

birdywen, might I ask if you first converted the Jupyter notebook with e.g. "jupyter nbconvert thenotebook.ipynb --to html"? If you did, you can extract the python code by instead running "jupyter nbconvert thenotebook.ipynb --to python", which will give you a runnable script. If you only have the already-converted file to work with, there may be several ways to do this, with or without LA.
Regards,
burque505

birdywen · (This post was last modified: 07-25-2023, 12:47 PM by birdywen.)

Code: Copy      Help
// script "财联社.cs"

/*/

nuget -\HtmlAgilityPack;

/*/

using System;

using System.Xml;

using HtmlAgilityPack;

///                    

public class Program

{

    public static void Main()

    {

        #region example

        var path = @"C:\Users\birdy\Desktop\CIS 512- Car Price Prediction-0719.html";

        var doc = new HtmlDocument();

        doc.Load(path);

        var node = doc.DocumentNode.SelectNodes("/html/body/div/div[1]/div[2]/div[2]/div/div/pre");

        foreach (var t in node) {

            print.it(t.InnerText);

        }

        #endregion

    }

}

How to replace them with ' ', " ", >, < Thank you so much!

birdywen · 07-25-2023, 12:48 PM

How to replace them with ' ', " ", < , >

***Gintaras*** · 07-25-2023, 01:37 PM

It's strange that the library does not automatically replace HTML entities. But namespace HtmlAgilityPack has a class for it.

print.it(HtmlEntity.DeEntitize(t.InnerText));

birdywen · 07-25-2023, 01:50 PM

Wow, It worked very well! Thanks

birdywen · 07-26-2023, 07:41 PM

Hi Gintaras,
It's that possible for LA to read the UI inner text? Just like reading the inner text like in HtmlAgilityPack. The reason I ask this question is because sometimes the webpage content what I want to get is required to login in first, but that is impossible for HtmlAgilityPack to extract text without logging on the account to the specific website.
Thank you!

***Gintaras*** · (This post was last modified: 07-30-2023, 04:17 AM by Gintaras.)

I know 2 ways, but probably more exist. Google: "C# extract Chrome web page element text".
Get HTML with elm.Html. Then somehow convert HTML to text, for example using regular expressions or HtmlAgilityPack.
Use Selenium. But it has problems connecting to existing web browser window. Look in Cookbook.

birdywen · (This post was last modified: 07-31-2023, 01:28 PM by birdywen.)

Hi Gintaras,

Your method of elm.Html. with HtmlAgilityPack is perfect combination. Now I can easily extract any text or other format (I mean any Elm) I wanted from any webpage.
That's so nice!
Thank you so much!

Code: Copy      Help
// script ""

/*/ nuget -\HtmlAgilityPack; /*/

//https://zerotomastery.io/cheatsheets/python-cheat-sheet/

using HtmlAgilityPack;

var w = wnd.find(1, "The Best Python Cheat Sheet | Zero To Mastery - Google Chrome", "Chrome_WidgetWin_1");

foreach (var e in w.Elm["web:GROUPING", prop: "@id=cheatsheet-content"]["TEXT", prop: "level=2"].FindAll()) {

var html = e.Html(false);

var doc1 = new HtmlDocument();

doc1.LoadHtml(html);

print.it(doc1.DocumentNode.InnerText);    

}

Login
Username:
Password:	Lost Password?
	Remember me