這篇文章給大家分享的是有關(guān)scrapy怎么樣測(cè)試python爬蟲的數(shù)據(jù)的內(nèi)容。小編覺得挺實(shí)用的,因此分享給大家做個(gè)參考。一起跟隨小編過來看看吧。
目前創(chuàng)新互聯(lián)已為上千家的企業(yè)提供了網(wǎng)站建設(shè)、域名、雅安服務(wù)器托管、網(wǎng)站托管、服務(wù)器租用、企業(yè)網(wǎng)站設(shè)計(jì)、翁源網(wǎng)站維護(hù)等服務(wù),公司將堅(jiān)持客戶導(dǎo)向、應(yīng)用為本的策略,正道將秉承"和諧、參與、激情"的文化,與客戶和合作伙伴齊心協(xié)力一起成長,共同發(fā)展。進(jìn)入到項(xiàng)目的根目錄下,運(yùn)行以下命令:
# 進(jìn)入到項(xiàng)目目錄 # cd /work/Code/scraper/TweetScraper scrapy crawl TweetScraper -a query="Novel coronavirus,#COVID-19"
注意,抓取Twitter的數(shù)據(jù)需要科學(xué)上網(wǎng)或者服務(wù)器部署在海外,所以使用的是海外的服務(wù)器。
[root@cs TweetScraper]# scrapy crawl TweetScraper -a query="Novel coronavirus,#COVID-19" 2020-04-16 19:22:40 [scrapy.utils.log] INFO: Scrapy 2.0.1 started (bot: TweetScraper) 2020-04-16 19:22:40 [scrapy.utils.log] INFO: Versions: lxml 4.2.1.0, libxml2 2.9.8, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) - [GCC 7.2.0], pyOpenSSL 18.0.0 (OpenSSL 1.0.2o 27 Mar 2018), cryptography 2.2.2, Platform Linux-3.10.0-862.el7.x86_64-x86_64-with-centos-7.5.1804-Core 2020-04-16 19:22:40 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'TweetScraper', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'TweetScraper.spiders', 'SPIDER_MODULES': ['TweetScraper.spiders'], 'USER_AGENT': 'TweetScraper'} 2020-04-16 19:22:40 [scrapy.extensions.telnet] INFO: Telnet Password: 1fb55da389e595db 2020-04-16 19:22:40 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.LogStats'] 2020-04-16 19:22:41 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2020-04-16 19:22:41 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] Mysql連接成功###################################### MySQLCursorBuffered: (Nothing executed yet) 2020-04-16 19:22:41 [TweetScraper.pipelines] INFO: Table 'tweets' already exists 2020-04-16 19:22:41 [scrapy.middleware] INFO: Enabled item pipelines: ['TweetScraper.pipelines.SavetoMySQLPipeline'] 2020-04-16 19:22:41 [scrapy.core.engine] INFO: Spider opened 2020-04-16 19:22:41 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2020-04-16 19:22:41 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2020-04-16 19:23:45 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 1 pages/min), scraped 11 items (at 11 items/min) 2020-04-16 19:24:44 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 1 pages/min), scraped 22 items (at 11 items/min) ^C2020-04-16 19:26:27 [scrapy.crawler] INFO: Received SIGINT, shutting down gracefully. Send again to force 2020-04-16 19:26:27 [scrapy.core.engine] INFO: Closing spider (shutdown) 2020-04-16 19:26:43 [scrapy.extensions.logstats] INFO: Crawled 3 pages (at 1 pages/min), scraped 44 items (at 11 items/min)
我們可以看到,該項(xiàng)目運(yùn)行OK,抓取到的數(shù)據(jù)也已經(jīng)被保存在數(shù)據(jù)庫了。
感謝各位的閱讀!關(guān)于scrapy怎么樣測(cè)試python爬蟲的數(shù)據(jù)就分享到這里了,希望以上內(nèi)容可以對(duì)大家有一定的幫助,讓大家可以學(xué)到更多知識(shí)。如果覺得文章不錯(cuò),可以把它分享出去讓更多的人看到吧!
標(biāo)題名稱:scrapy怎么樣測(cè)試python爬蟲的數(shù)據(jù)-創(chuàng)新互聯(lián)
路徑分享:http://jinyejixie.com/article32/csdesc.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供網(wǎng)站設(shè)計(jì)公司、云服務(wù)器、移動(dòng)網(wǎng)站建設(shè)、搜索引擎優(yōu)化、網(wǎng)站營銷、做網(wǎng)站
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來源: 創(chuàng)新互聯(lián)
猜你還喜歡下面的內(nèi)容