楼主 xmyjk |
Q:如何运用VBA批量下载CNTV网页新闻至TXT文本? 提取http://news.cntv.cn/program/xwlb/20110101/103918.shtml系列网址的网页新闻。要求如下: 1、提取网页内的主标题和正文内容。
2、需可导入N个网页的文本合并成一个TXT文本输出,合并的顺序按网页的顺序从上到下排列,网页内容和网页内容之间空一行。
3、如果可能,请加一个网页地址批量输入窗口,这样可以一次性黏贴进多条网页地址,并批量建立超链接。
A:下载代码:- Option Explicit
- Sub T()
- Dim tmp() As String, i As Integer, arr() As String, xmlhttp As Object, P As Long, getpage As String, TITLE As String
-
- If Dir(ThisWorkbook.Path & "\XIAZAI.txt") <> "" Then Kill ThisWorkbook.Path & "\XIAZAI.txt"
- For P = 2 To [A65536].End(3).Row
- Set xmlhttp = CreateObject("Microsoft.XMLHTTP")
- With xmlhttp
- .Open "get", Trim(Cells(P, 2).Value), False
- .send
- tmp = Filter(Split(Replace(Replace(StrConv(.responsebody, vbUnicode, &H804), "<P><STRONG>", ""), " ", ""), "</P>"), "<P>")
- TITLE = Split(Split(StrConv(.responsebody, vbUnicode, &H804), "<h1 class=""b-tit"">")(1), "</h1>")(0)
- End With
- ReDim arr(UBound(tmp))
- For i = 0 To UBound(arr)
- arr(i) = Split(tmp(i), "<P>")(1)
- Next
- getpage = Replace(Replace(Replace(Join(arr, vbCrLf), "<BR>", ""), "<STRONG>", ""), "</STRONG>", "")
- Erase tmp
- Erase arr
-
- Open ThisWorkbook.Path & "\XIAZAI.txt" For Append As #1
- Print #1, vbCrLf & vbCrLf
- Print #1, TITLE
- Print #1, getpage
- Close #1
-
- getpage = ""
- TITLE = ""
-
- Next
- Set xmlhttp = Nothing
-
- MsgBox "Ok"
- End Sub
导入网址代码:- Private Sub CommandButton1_Click()
- Dim arr() As String
- If InStr(TextBox1.Text, vbCrLf) > 0 Then
- arr = Split(TextBox1.Text, vbCrLf)
- Else
- ReDim arr(0): arr(0) = TextBox1.Text
- End If
- For i = 0 To UBound(arr)
- ActiveSheet.Hyperlinks.Add Anchor:=Cells(i + 2, 2), Address:=arr(i), TextToDisplay:=arr(i)
- Next
- End Sub
该帖已经同步到 网页内容另存TXT文本.rar |