Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice
  3. Join us on November 16th, 2023, between 1 pm and 9 pm CET for Ask the Experts Online on Discord and on Unity Discussions.
    Dismiss Notice

Extract HTML Table from Web Page source

Discussion in 'Scripting' started by RoyalCoder, Jan 29, 2018.

  1. RoyalCoder

    RoyalCoder

    Joined:
    Oct 4, 2013
    Posts:
    301
    Hi my friends,

    I build a script to read & download a specific webpage source in a text file in Unity, what I really want to achieve is to extract from this pages only html tables data, for example (the red text to be removed, the green one to keep and extract data):

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html lang="en" xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" >
    <head>
    <link rel="shortcut icon" href="https://mywebsite.com/favicon.ico" />
    <title>My Website Com</title>
    <meta name="description" content="Ministerul pentru intreprinderi mici si mijlocii, comert, turism si profesii liberale"/>
    <meta name="keywords" content="My Web Site/>
    <meta name="Language" content="en"/>
    <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
    <meta name="rating" content="General" />
    <meta name="revisit-after" content="7 Days" />
    <meta name="robots" content="index,follow" />
    <link rel="shortcut icon" href="/favicon.ico" />
    <meta name="publisher" content="Unity Design" />
    <meta name="copyright" content="Copyright (c) Unity Design" />
    <meta name="author" content="Developed by Unity Design - www.UnityDesign.com" />
    <link href="/css/style.css?t=2017061401" rel="stylesheet" type="text/css" />
    <link href="/css/uploader.css?t=2017061401" rel="stylesheet" type="text/css" />
    <script type="text/javascript" src="/js/jquery-1.8.0.min.js?t=2017061401"></script>
    <script>
    var jQr = jQuery.noConflict();
    </script>
    <script type="text/javascript" src="/js/mootools-1.2.5-core-yc.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/mootools-1.2.5.1-more.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/uploader/Swiff.Uploader.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/uploader/Fx.ProgressBar.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/uploader/Lang.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/uploader/FancyUpload2.js?t=2017061401"></script>
    <script type="text/javascript" src="/js/js.js?t=2017061401"></script>
    <script src='https://www.google.com/recaptcha/api.js?hl=en'></script>
    </head>
    <body onload="$('ajaxloader').setStyle('display','none')"><div id="container">

    <div class="logo_container">
    <a href="/" id="logo" title="MWC - Home Page"><img src="/i/logo.png?40084" /></a>

    <div style="position:absolute; right:0; top:107px;" id="ajaxloader"><img src="/i/ajax-loader.gif" /></div>
    </div>
    <div class="menu_top">
    <a href="https://mywebsite.com/" title="Home Page"><h2>Home Page</h2></a>
    <a href="https://mywebsite.com/contact/" title="Contact"><h2>Contact</h2></a>
    <div class="clear"></div>
    </div>
    <div style="clear:both;"></div>
    <div style="padding:5px 0;"></div>

    <div id="content" ><h1>List of items: Example</h1><br><br>

    <div class="tableExample" style="padding-left:0;">
    <table class="formular">

    <tr>
    <th>Position</th>
    <th>Name of item</th>
    <th>Date added</th>
    </tr>
    <tr>
    <td>1</td>
    <td>John</td>
    <td>2017-07-14 19:19</td>
    </tr>
    <tr>
    <td>2</td>
    <td>Jane</td>
    <td>2017-07-14 19:30</td>
    </tr>
    <tr>
    <td>3</td>
    <td>Kelly</td>
    <td>2017-07-14 18:44</td>
    </tr>
    <tr>
    <td>4</td>
    <td>Michael</td>
    <td>2017-07-12 12:49</td>
    </tr>
    <tr>
    <td>5</td>
    <td>William</td>
    <td>2017-07-13 00:26</td>
    </tr>
    </table>
    </div>
    </div><script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

    ga('create', 'UA-100774771-1', 'auto');
    ga('send', 'pageview');

    </script>
    </body>
    </html>

    Any ideas how can achieve this?
    Thanks in advance!
     
  2. Zonlib

    Zonlib

    Joined:
    Apr 15, 2014
    Posts:
    39
    You just build a script to read html page as an xml document and get the node named 'table'.
     
    Last edited: Jan 29, 2018
    RoyalCoder likes this.
  3. johne5

    johne5

    Joined:
    Dec 4, 2011
    Posts:
    1,133
    RoyalCoder likes this.
  4. Brathnann

    Brathnann

    Joined:
    Aug 12, 2014
    Posts:
    7,146
    RoyalCoder likes this.
  5. pandigital

    pandigital

    Joined:
    Mar 20, 2009
    Posts:
    15
    hi - can anyone point me to (or provide) a guide for getting AngleSharp working with Unity ?
     
  6. Brathnann

    Brathnann

    Joined:
    Aug 12, 2014
    Posts:
    7,146