2010年1月5日 星期二

All about HTML tags 9 Regular Expressions to strip HTML tags

這是我目前找到跟html有關而且最好用的正規式參考範例了
轉自
http://www.pagecolumn.com/tool/all_about_html_tags.htm


All about HTML tags

9 Regular Expressions to strip HTML tags



Quick syntax reference

flags
  • g - global match
  • i - ignore case
  • m - match over multiple lines
Escaping
  • \ - special characters to literal and literal characters to special
Quantifiers

  • ? - matches zero or one times
  • * - matches zero or more times
  • + - matches one or more times
  • {n} - matches n times
  • {n, m} - matches at least n times, but not more than m times
Anchors
  • ^ - matches at the start of the line
  • $ - matches at the end of the line
  • \b - matches at the beginning or the end of a word
delimiter
  • (?:x) - matches x not remember the match
  • x(?=y) - matches x only if x is followed by y
  • x(?!y) - matches x only if x is not followed by y
Character Escapes
  • \s - matches whitespace
  • \S - matches anything but a whitespace
  • \f - matches a form-feed
  • \n - matches a linefeed
  • \r - matches a carriage return
  • \t - matches a horizontal tab
  • \v - matches vertical tab
  • \w - matches any alphanumeric character including the underscore. Equivalent to [A-Za-z0-9_]
  • \W - matches any non-word character. Equivalent to [^A-Za-z0-9_]
Others
  • . - matches any character except a newline


It's not an easy job to parse HTML tags of the whole page using regular expressions.
But if you are dealing with a part of HTML tags and handle it as a string, the following regular expressions may be of your help.


1
matches specific tag pairs and content between them

RegEx Expression:
/<\s*h4[^>]*>(.*?)<\s*/\s*h4>/g

Method:
exec, match
Testing String

<h4 class="sds">And more ...</h4>


Live Test



2
matches all HTML tags pairs including attributes in the tags

RegEx Expression:
/<(.|\n)*?>/g
Method:
match
Testing String

<div class="tab0">CSS code formatter</div><div class="tab2">CSS code compressor</div>


Live Test


3
match all start tags including attributes in the tags

RegEx Expression:
/<\s*\w.*?>/g
Method:
match
Testing String
<div class="box">5 px radius of round corner</div><div class="box">7 px radius of round
corner</div><div style="color:#6699cc">color</div>

Live Test


4
matches all close tag

RegEx Expression:
/<\s*\/\s*\w\s*.*?>|<\s*br\s*>/g
Method:
match
Testing String

<div class="sds">not sure where it can be used</div></br>


Live Test


5
matches start tag of specific tag including attibutes

RegEx Expression:
/<\s*div.*?>/g
Method:
match
Testing String

<div class="tab1">tabs generator</div>


Live Test


6
matches close part of specific tag pair

RegEx Expression:
/<\s*\/\s*div\s*.*?>/g
Method:
match
Testing String

<div class="sds">javascript + CSS ...</div>


Live Test


7
matches specific HTML tag pair including attributes in the tags.

RegEx Expression:
/<\s*\/?\s*span\s*.*?>/g
Method:
match
Testing String

<span class="csc">Regex examples</span>


Live Test

8
matches start tag with specific attribute

RegEx Expression:
/<\s*\w*\s*style.*?>/g
Method:
match
Testing String

<div style="color:#6699cc">round corner</div>


Live Test

9
matches start tag with specific attribute

RegEx Expression:
/<\s*\w*\s*href\s*=\s*"?\s*([\w\s%#\/\.;:_-]*)\s*"?.*?>/g
Method:
exec, match
Testing String
<span ><a href="http://www.pagecolumn.com/">

3 Column Layout Generator </a></span> <span >
<a href="http://www.pagecolumn.com/2_col_generator.htm">2 Column Layout Generator</a></span>

Live Test

Social Bookmark if the tool is useful.


沒有留言:

張貼留言

推到 Twitter!
推到 Plurk!
推到 Facebook!