PDA

View Full Version : سوال: نحوه استخراج محتوای متنی بین تگ های خاص html



ak_lbrrn
یک شنبه 25 مرداد 1394, 18:58 عصر
به نام خدا سلام من میخوام با قطعه کد زیر، متن های میان تگ <p> , </p> رو (از یک فایل متنی که نتیجه ذخیره سازی سورس اچ تی ام ال یک صفحه است) استخراج کنم و در یک فایل متنی جدید ذخیره کنم ولی فایل جدیدم خالیه! میشه ایراد کدم رو بگید؟ سپاسگزارم.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
using System.IO;

namespace text_extraction
{
internal class Program
{
public static string StripTagsCharArray(string source)
{
char[] array = new char[source.Length];
int arrayIndex = 0;
bool inside = false;
for (int i = 0; i < source.Length; i++)
{
char let = source[i];
if (let == '<')
{
i++;
if (let == 'p')
{
i++;
if (let == '>')
{
i++;
inside = true;
continue;
}
if (let == '<')
{
inside = false;
continue;
}
}
}
while (inside)
{
array[arrayIndex] = let;
arrayIndex++;
}
}
return new string(array, 0, arrayIndex);
}

private static void Main(string[] args)
{
string thepath = Environment.CurrentDirectory;
string removed_htmltags = thepath + @"\htmlremovaltest.txt";
string line = File.ReadAllText(@"f:\\local.txt");
File.WriteAllText(removed_htmltags, StripTagsCharArray(line));
Console.WriteLine("Tags are removed, please check the file");
Console.ReadKey();
}
}
}

juza66
یک شنبه 25 مرداد 1394, 19:43 عصر
نمیدونم منظورت اینه؟


WebBrowser1.Document.GetElementById("ID").InnerText = "متن مورد نظر شما";

ak_lbrrn
یک شنبه 25 مرداد 1394, 20:33 عصر
فرض کنید من یک فایل متنی دارم که حاوی سورس html مثل خطوط زیر هستش!


<p>In addition to facilitating the reproduction of flowering plants, flowers have long been admired and used by humans to beautify their environment, and also as objects of romance, ritual, religion, medicine and as a source of food.</p>
<p></p>
<div id="toc" class="toc">
<div id="toctitle">
<h2>Contents</h2>
</div>
<ul>
<li class="toclevel-1 tocsection-1"><a href="#Morphology"><span class="tocnumber">1</span> <span class="toctext">Morphology</span></a>
<ul>
<li class="toclevel-2 tocsection-2"><a href="#Floral_parts"><span class="tocnumber">1.1</span> <span class="toctext">Floral parts</span></a>
<ul>
<li class="toclevel-3 tocsection-3"><a href="#Vegetative_.28Perianth.29"><span class="tocnumber">1.1.1</span> <span class="toctext">Vegetative (Perianth)</span></a></li>
<li class="toclevel-3 tocsection-4"><a href="#Reproductive"><span class="tocnumber">1.1.2</span> <span class="toctext">Reproductive</span></a></li>
</ul>
</li>
<li class="toclevel-2 tocsection-5"><a href="#Structure"><span class="tocnumber">1.2</span> <span class="toctext">Structure</span></a>
<ul>
<li class="toclevel-3 tocsection-6"><a href="#Inflorescence"><span class="tocnumber">1.2.1</span> <span class="toctext">Inflorescence</span></a></li>
<li class="toclevel-3 tocsection-7"><a href="#Floral_diagrams_and_floral_formulae"><span class="tocnumber">1.2.2</span> <span class="toctext">Floral diagrams and floral formulae</span></a></li>
</ul>
</li>
</ul>
</li>
<li class="toclevel-1 tocsection-8"><a href="#Development"><span class="tocnumber">2</span> <span class="toctext">Development</span></a>
<ul>
<li class="toclevel-2 tocsection-9"><a href="#Flowering_transition"><span class="tocnumber">2.1</span> <span class="toctext">Flowering transition</span></a></li>
<li class="toclevel-2 tocsection-10"><a href="#Organ_development"><span class="tocnumber">2.2</span> <span class="toctext">Organ development</span></a></li>
</ul>
</li>
<li class="toclevel-1 tocsection-11"><a href="#Floral_function"><span class="tocnumber">3</span> <span class="toctext">Floral function</span></a>
<ul>
<li class="toclevel-2 tocsection-12"><a href="#Flower_specialization_and_pollination"><span class="tocnumber">3.1</span> <span class="toctext">Flower specialization and pollination</span></a></li>
</ul>
</li>
<li class="toclevel-1 tocsection-13"><a href="#Pollination"><span class="tocnumber">4</span> <span class="toctext">Pollination</span></a>
<ul>
<li class="toclevel-2 tocsection-14"><a href="#Pollen"><span class="tocnumber">4.1</span> <span class="toctext">Pollen</span></a></li>
<li class="toclevel-2 tocsection-15"><a href="#Attraction_methods"><span class="tocnumber">4.2</span> <span class="toctext">Attraction methods</span></a></li>
<li class="toclevel-2 tocsection-16"><a href="#Pollination_mechanism"><span class="tocnumber">4.3</span> <span class="toctext">Pollination mechanism</span></a></li>
<li class="toclevel-2 tocsection-17"><a href="#Flower-pollinator_relationships"><span class="tocnumber">4.4</span> <span class="toctext">Flower-pollinator relationships</span></a></li>
</ul>
</li>
<li class="toclevel-1 tocsection-18"><a href="#Fertilization_and_dispersal"><span class="tocnumber">5</span> <span class="toctext">Fertilization and dispersal</span></a></li>
<li class="toclevel-1 tocsection-19"><a href="#Evolution"><span class="tocnumber">6</span> <span class="toctext">Evolution</span></a></li>
<li class="toclevel-1 tocsection-20"><a href="#Symbolism"><span class="tocnumber">7</span> <span class="toctext">Symbolism</span></a></li>
<li class="toclevel-1 tocsection-21"><a href="#Usage"><span class="tocnumber">8</span> <span class="toctext">Usage</span></a></li>
<li class="toclevel-1 tocsection-22"><a href="#See_also"><span class="tocnumber">9</span> <span class="toctext">See also</span></a></li>
<li class="toclevel-1 tocsection-23"><a href="#References"><span class="tocnumber">10</span> <span class="toctext">References</span></a></li>
<li class="toclevel-1 tocsection-24"><a href="#Further_reading"><span class="tocnumber">11</span> <span class="toctext">Further reading</span></a></li>
<li class="toclevel-1 tocsection-25"><a href="#External_links"><span class="tocnumber">12</span> <span class="toctext">External links</span></a></li>
</ul>
</div>
<p></p>
<h2><span class="mw-headline" id="Morphology">Morphology</span></h2>
<div class="thumb tright">
<div class="thumbinner" style="width:302px;"><a href="/wiki/File:Ranunculus_glaberrimus_labelled.jpg" class="image"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Ranunculus_glaberrimus_labelled.jpg/300px-Ranunculus_glaberrimus_labelled.jpg" width="300" height="159" class="thumbimage" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Ranunculus_glaberrimus_labelled.jpg/450px-Ranunculus_glaberrimus_labelled.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Ranunculus_glaberrimus_labelled.jpg/600px-Ranunculus_glaberrimus_labelled.jpg 2x" data-file-width="2090" data-file-height="1110" /></a>
<div class="thumbcaption">
<div class="magnify"><a href="/wiki/File:Ranunculus_glaberrimus_labelled.jpg" class="internal" title="Enlarge"></a></div>
Main parts of a mature flower (<a href="/wiki/Ranunculus_glaberrimus" title="Ranunculus glaberrimus">Ranunculus glaberrimus</a>)</div>
</div>
</div>
<div class="thumb tright">
<div class="thumbinner" style="width:302px;"><a href="/wiki/File:Mature_flower_diagram.svg" class="image"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/7/7f/Mature_flower_diagram.svg/300px-Mature_flower_diagram.svg.png" width="300" height="154" class="thumbimage" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/7/7f/Mature_flower_diagram.svg/450px-Mature_flower_diagram.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/7/7f/Mature_flower_diagram.svg/600px-Mature_flower_diagram.svg.png 2x" data-file-width="423" data-file-height="217" /></a>
<div class="thumbcaption">
<div class="magnify"><a href="/wiki/File:Mature_flower_diagram.svg" class="internal" title="Enlarge"></a></div>
Diagram of flower parts</div>
</div>
</div>
<h3><span class="mw-headline" id="Floral_parts">Floral parts</span></h3>
<p>The essential parts of a flower can be considered in two parts: the vegetative part, consisting of petals and associated structures in the perianth, and the reproductive or sexual parts. A stereotypical flower consists of four kinds of structures attached to the tip of a short stalk. Each of these kinds of parts is arranged in a <a href="/wiki/Whorl_(botany)" title="Whorl (botany)">whorl</a> on the <a href="/wiki/Receptacle_(botany)" title="Receptacle (botany)">receptacle</a>. The four main whorls (starting from the base of the flower or lowest node and working upwards) are as follows:</p>
<h4><span class="mw-headline" id="Vegetative_.28Perianth.29">Vegetative (Perianth)</span></h4>
<div class="hatnote relarticle mainarticle">Main articles: <a href="/wiki/Perianth" title="Perianth">Perianth</a>, <a href="/wiki/Sepal" title="Sepal">Sepal</a> and <a href="/wiki/Corolla_(flower)" title="Corolla (flower)" class="mw-redirect">Corolla (flower)</a></div>
<p>Collectively the calyx and corolla form the <a href="/wiki/Perianth" title="Perianth">perianth</a> (see diagram).</p>
<ul>
<li><i><a href="/wiki/Sepal" title="Sepal">Calyx</a></i>: the outermost whorl consisting of units called <i><a href="/wiki/Sepal" title="Sepal">sepals</a></i>; these are typically green and enclose the rest of the flower in the bud stage, however, they can be absent or prominent and petal-like in some species.</li>
<li><i><a href="/wiki/Petal" title="Petal">Corolla</a></i>: the next whorl toward the apex, composed of units called <i><a href="/wiki/Petal" title="Petal">petals</a></i>, which are typically thin, soft and colored to attract animals that help the process of <a href="/wiki/Pollination" title="Pollination">pollination</a>.</li>
</ul>
<h4><span class="mw-headline" id="Reproductive">Reproductive</span></h4>
<div class="hatnote relarticle mainarticle">Main articles: <a href="/wiki/Plant_reproductive_morphology" title="Plant reproductive morphology">Plant reproductive morphology</a>, <a href="/wiki/Androecium" title="Androecium" class="mw-redirect">Androecium</a> and <a href="/wiki/Gynoecium" title="Gynoecium">Gynoecium</a></div>
<div class="thumb tright">
<div class="thumbinner" style="width:302px;"><a href="/wiki/File:Lillium_Stamens.jpg" class="image"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/a/a4/Lillium_Stamens.jpg/300px-Lillium_Stamens.jpg" width="300" height="450" class="thumbimage" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/a/a4/Lillium_Stamens.jpg/450px-Lillium_Stamens.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/a/a4/Lillium_Stamens.jpg/600px-Lillium_Stamens.jpg 2x" data-file-width="1296" data-file-height="1944" /></a>
<div class="thumbcaption">
<div class="magnify"><a href="/wiki/File:Lillium_Stamens.jpg" class="internal" title="Enlarge"></a></div>
Reproductive parts of Easter Lily (<i>Lilium longiflorum</i>). 1. Stigma, 2. Style, 3. Stamens, 4. Filament, 5. Petal</div>
</div>
</div>
<ul>
<li><i><a href="/wiki/Androecium" title="Androecium" class="mw-redirect">Androecium</a></i> (from Greek <i>andros oikia</i>: man's house): the next whorl (sometimes multiplied into several whorls), consisting of units called <a href="/wiki/Stamen" title="Stamen">stamens</a>. Stamens consist of two parts: a stalk called a <a href="/wiki/Stamen" title="Stamen">filament</a>, topped by an <a href="/wiki/Anther" title="Anther" class="mw-redirect">anther</a> where <a href="/wiki/Pollen" title="Pollen">pollen</a> is produced by meiosis and eventually dispersed.</li>
<li><i><a href="/wiki/Gynoecium" title="Gynoecium">Gynoecium</a></i> (from Greek <i>gynaikos oikia</i>: woman's house): the innermost whorl of a flower, consisting of one or more units called carpels. The <a href="/wiki/Gynoecium#Carpels" title="Gynoecium">carpel</a> or multiple fused carpels form a hollow structure called an ovary, which produces ovules internally. Ovules are megasporangia and they in turn produce megaspores by meiosis which develop into female gametophytes. These give rise to egg cells. The gynoecium of a flower is also described using an alternative terminology wherein the structure one sees in the innermost whorl (consisting of an ovary, style and stigma) is called a pistil. A pistil may consist of a single carpel or a number of carpels fused together. The sticky tip of the pistil, the stigma, is the receptor of pollen. The supportive stalk, the style, becomes the pathway for <a href="/wiki/Pollen_tube" title="Pollen tube">pollen tubes</a> to grow from pollen grains adhering to the stigma. The relationship to the gynoecium on the receptacle is described as <b>hypogynous</b> (beneath a superior ovary), <b>perigynous</b> (surrounding a superior ovary), or <b>epigynous</b> (above inferior ovary).</li>
</ul>
<h3><span class="mw-headline" id="Structure">Structure</span></h3>
<p>Although the arrangement described above is considered "typical", plant species show a wide variation in floral structure.<sup id="cite_ref-1" class="reference"><a href="#cite_note-1"><span>[</span>1<span>]</span></a></sup> These modifications have significance in the evolution of flowering plants and are used extensively by botanists to establish relationships among plant species.</p>
<p>The four main parts of a flower are generally defined by their positions on the receptacle and not by their function. Many flowers lack some parts or parts may be modified into other functions and/or look like what is typically another part. In some families, like <a href="/wiki/Ranunculaceae" title="Ranunculaceae">Ranunculaceae</a>, the petals are greatly reduced and in many species the sepals are colorful and petal-like. Other flowers have modified stamens that are petal-like; the double flowers of <a href="/wiki/Peonie" title="Peonie" class="mw-redirect">Peonies</a> and <a href="/wiki/Rose" title="Rose">Roses</a> are mostly petaloid stamens.<sup id="cite_ref-2" class="reference"><a href="#cite_note-2"><span>[</span>2<span>]</span></a></sup> Flowers show great variation and
plant scientists describe this variation in a systematic way to identify and distinguish species.</p>

من قسمت های bold شده رو میخوام نگه دارم و بقیه قسمت ها حذف بشه!

pbm_soy
دوشنبه 26 مرداد 1394, 03:05 صبح
اونطوری که تو پست اول دیدم شما سعی کردید فایل html را خودتون باز کنید و تفسیر کنید که کار سخت و زمانبری است پیشنهاد میکنم از api ها و کلاسهایی که html را پارس میکنند استفاده کنید و یا طبق روشی که دوستمون گفتن استفاده کنید و صفحه وب را در یک web browser لود کنید و سپس از طریق آن به محتوای تگها دسترسی پیدا کنید فکر میکنم متد get elements by tag name هم داشته باشد

pbm_soy
دوشنبه 26 مرداد 1394, 03:10 صبح
اینم لینکی برای شروع کار که خوب هست و حتی مثال هم برای دانلود گذاشته

http://www.codeproject.com/Tips/804660/How-to-Parse-HTML-using-Csharp

SabaSabouhi
دوشنبه 26 مرداد 1394, 08:12 صبح
سلام
هر فایل HTML یک XML هست، پس شما می‌تونی خیلی راحت HTML خودت رو به عنوان یک XML باز کنی و به راحتی
بخونیش. و به هر یک از Nodeها و Attributeها و Valueهای اون دسترسی راحتی داشته باشی.

صبا صبوحی