今天来给大家分享4个python脚本,分别是定时抓取Solidot,IT之家,Linux中国和抽屉·挨踢1024这四个媒体的rss链接然后定时发送汇总邮件。
注意事项:
- 部署采用腾讯云函数,部署方式具体参考本站这篇文章
- 在采用腾讯云函数部署设置定时触发器时,solidot建议设置在22点左右,因为它一般晚上九点多久不更新了;IT之家建议设置在23:59因为之家基本全天24小时都在更新,这样设置的话即使早睡也可以在第二天早上起来查看邮件;Linux中国也可以设置在22点左右,因为他一般下午三四点更新三五篇文章;抽屉·挨踢1024同理,我一般习惯晚上看
- 下面Linux中国的脚本在参考上面的部署文章进行
pip install
时要多加一个user-agent
库 - 抽屉1024的代码在腾讯云函数部署的时候一定要把位置选择为海外的地方,比如硅谷,因为采用了rsshub的rss链接,而rsshub貌似已经需要富强,所以你懂的
solidot
其实本站的这篇实战腾讯云函数的文章就是实战的solidot,只不过那里面的代码缺少一个检测是否是当天发出的文章的功能,由于改动较大就再把新的代码贴到下面一次,否则我就让大家直接去复制粘贴那篇文章中的代码了。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
#!/usr/bin/env python3 # coding=utf-8 import re import time import smtplib import requests import datetime from bs4 import BeautifulSoup from email.mime.text import MIMEText HOST = 'smtp.126.com' PORT = 25 SENDER = '@126.com' RECEIVER = '@qq.com' PWD = '' current_time = time.strftime("%Y-%m-%d", time.localtime()) def english_time_to_num(time_str): result = re.search(r'(\d+ \w+ \d+)', time_str).group(1) time_format = datetime.datetime.strptime(result, '%d %b %Y') time_format = time_format.strftime('%Y-%m-%d') return time_format def mail_send(subject, mail_body): try: msg = MIMEText(mail_body, 'plain', 'utf-8') msg['Subject'] = subject msg['From'] = SENDER msg['To'] = RECEIVER s = smtplib.SMTP(HOST, PORT) s.debuglevel = 0 s.login(SENDER, PWD) s.sendmail(SENDER, RECEIVER, msg.as_string()) s.quit() except smtplib.SMTPException as e: print(str(e)) exit(1) def get_soup(): url = 'https://www.solidot.org/index.rss' rss_xml = requests.get(url).text soup = BeautifulSoup(rss_xml, 'xml') return soup def get_mail_body(): contents = get_soup().select('item') contents_list = [] for c in contents: pub_date = c.select_one('pubDate').get_text() pub_date_to_num = english_time_to_num(pub_date) if pub_date_to_num == current_time: title = c.select_one('title').get_text() link = c.select_one('link').get_text() contents_list.append(title + '\n' + link) return '\n'.join(contents_list) def main(arg1, arg2): mail_send(subject=current_time + ' Solidot今日文章', mail_body=get_mail_body()) print('成功发送了一封邮件!') |
IT之家
把上面的solidot的代码中的url和发邮件的主题字符串替换一下就好了,因为这两家的rss都是同样的atom协议,所以代码可以复用
具体如下:
把下面的url替换为https://www.ithome.com/rss/
1 2 3 |
def get_soup(): url = 'https://www.solidot.org/index.rss' |
把下面的邮件标题替换' IT之家今日文章'
1 2 3 4 5 |
def main(arg1, arg2): mail_send(subject=current_time + ' Solidot今日文章', mail_body=get_mail_body()) print('成功发送了一封邮件!') |
抽屉·挨踢1024
和上面的IT之家一样,把上面的solidot的代码中的url和发邮件的主题字符串替换一下就好了,因为这两家的rss都是同样的atom协议,所以代码可以复用
具体如下:
把下面的url替换为https://rsshub.app/chouti/tec
1 2 3 |
def get_soup(): url = 'https://www.solidot.org/index.rss' |
把下面的邮件标题替换'抽屉挨踢1024今日文章'
1 2 3 4 |
def main(arg1, arg2): mail_send(subject=current_time + ' Solidot今日文章', mail_body=get_mail_body()) print('成功发送了一封邮件!') |
Linux中国
因为Linux中国的要设置http header修改的地方比较多,我就不说怎么修改代码,而是直接把完整的代码贴在下面了
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
#!/usr/bin/env python3 # coding=utf-8 import re import time import smtplib import requests import datetime from bs4 import BeautifulSoup from email.mime.text import MIMEText from user_agent import generate_user_agent HOST = 'smtp.126.com' PORT = 25 SENDER = '@126.com' RECEIVER = '@qq.com' PWD = '' HEADERS = { 'accept': "text/html,application/xhtml+xml,application/xml" ";q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3", 'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7', 'upgrade-insecure-requests': '1', 'accept-encoding': 'gzip, deflate, br', 'user-agent': generate_user_agent(os='win')} current_time = time.strftime("%Y-%m-%d", time.localtime()) def english_time_to_num(time_str): result = re.search(r'(\d+ \w+ \d+)', time_str).group(1) time_format = datetime.datetime.strptime(result, '%d %b %Y') time_format = time_format.strftime('%Y-%m-%d') return time_format def mail_send(subject, mail_body): try: msg = MIMEText(mail_body, 'plain', 'utf-8') msg['Subject'] = subject msg['From'] = SENDER msg['To'] = RECEIVER s = smtplib.SMTP(HOST, PORT) s.debuglevel = 0 s.login(SENDER, PWD) s.sendmail(SENDER, RECEIVER, msg.as_string()) s.quit() except smtplib.SMTPException as e: print(str(e)) exit(1) def get_soup(): url = 'https://linux.cn/rss.xml' rss_xml = requests.get(url, headers=HEADERS).text soup = BeautifulSoup(rss_xml, 'xml') return soup def get_mail_body(): contents = get_soup().select('item') contents_list = [] for c in contents: pub_date = c.select_one('pubDate').get_text() pub_date_to_num = english_time_to_num(pub_date) if pub_date_to_num == current_time: title = c.select_one('title').get_text() link = c.select_one('link').get_text() contents_list.append(title + '\n' + link) return '\n'.join(contents_list) def main(arg1, arg2): mail_send(subject=current_time + ' Linux中国今日文章', mail_body=get_mail_body()) print('成功发送了一封邮件!') |
下面是Linux中国的脚本在调试过程中的截图:
最后强调一下这些代码如何部署,一定要参考本站的这篇文章。