這篇文章主要講解了“hadoop網(wǎng)站日志舉例分析”,文中的講解內(nèi)容簡(jiǎn)單清晰,易于學(xué)習(xí)與理解,下面請(qǐng)大家跟著小編的思路慢慢深入,一起來(lái)研究和學(xué)習(xí)“hadoop網(wǎng)站日志舉例分析”吧!
在豐滿等地區(qū),都構(gòu)建了全面的區(qū)域性戰(zhàn)略布局,加強(qiáng)發(fā)展的系統(tǒng)性、市場(chǎng)前瞻性、產(chǎn)品創(chuàng)新能力,以專(zhuān)注、極致的服務(wù)理念,為客戶提供成都網(wǎng)站建設(shè)、成都做網(wǎng)站 網(wǎng)站設(shè)計(jì)制作按需定制開(kāi)發(fā),公司網(wǎng)站建設(shè),企業(yè)網(wǎng)站建設(shè),成都品牌網(wǎng)站建設(shè),全網(wǎng)整合營(yíng)銷(xiāo)推廣,外貿(mào)營(yíng)銷(xiāo)網(wǎng)站建設(shè),豐滿網(wǎng)站建設(shè)費(fèi)用合理。
一、項(xiàng)目要求
日志處理方法中的日志,僅指Web日志。其實(shí)并沒(méi)有精確的定義,可能包括但不限于各種前端Web服務(wù)器——apache、lighttpd、nginx、tomcat等產(chǎn)生的用戶訪問(wèn)日志,以及各種Web應(yīng)用程序自己輸出的日志。
二、需求分析: KPI指標(biāo)設(shè)計(jì)
PV(PageView): 頁(yè)面訪問(wèn)量統(tǒng)計(jì)
IP: 頁(yè)面獨(dú)立IP的訪問(wèn)量統(tǒng)計(jì)
Time: 用戶每小時(shí)PV的統(tǒng)計(jì)
Source: 用戶來(lái)源域名的統(tǒng)計(jì)
Browser: 用戶的訪問(wèn)設(shè)備統(tǒng)計(jì)
下面我著重分析瀏覽器統(tǒng)計(jì)
三、分析過(guò)程
1、 日志的一條nginx記錄內(nèi)容
222.68.172.190 - - [18/Sep/2013:06:49:57 +0000] "GET /images/my.jpg HTTP/1.1" 200 19939
"http://www.angularjs.cn/A00n"
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36"
2、對(duì)上面的日志記錄進(jìn)行分析
remote_addr : 記錄客戶端的ip地址, 222.68.172.190
remote_user : 記錄客戶端用戶名稱(chēng), –
time_local: 記錄訪問(wèn)時(shí)間與時(shí)區(qū), [18/Sep/2013:06:49:57 +0000]
request: 記錄請(qǐng)求的url與http協(xié)議, “GET /images/my.jpg HTTP/1.1″
status: 記錄請(qǐng)求狀態(tài),成功是200, 200
body_bytes_sent: 記錄發(fā)送給客戶端文件主體內(nèi)容大小, 19939
http_referer: 用來(lái)記錄從那個(gè)頁(yè)面鏈接訪問(wèn)過(guò)來(lái)的, “http://www.angularjs.cn/A00n”
http_user_agent: 記錄客戶瀏覽器的相關(guān)信息, “Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36″
3、java語(yǔ)言分析上面一條日志記錄(使用空格切分)
String line = "222.68.172.190 - - [18/Sep/2013:06:49:57 +0000] \"GET /images/my.jpg HTTP/1.1\" 200 19939 \"http://www.angularjs.cn/A00n\" \"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36\""; String[] elementList = line.split(" "); for(int i=0;i<elementList.length;i++){ System.out.println(i+" : "+elementList[i]); }
測(cè)試結(jié)果:
0 : 222.68.172.190 1 : - 2 : - 3 : [18/Sep/2013:06:49:57 4 : +0000] 5 : "GET 6 : /images/my.jpg 7 : HTTP/1.1" 8 : 200 9 : 19939 10 : "http://www.angularjs.cn/A00n" 11 : "Mozilla/5.0 12 : (Windows 13 : NT 14 : 6.1) 15 : AppleWebKit/537.36 16 : (KHTML, 17 : like 18 : Gecko) 19 : Chrome/29.0.1547.66 20 : Safari/537.36"
4、實(shí)體Kpi類(lèi)的代碼:
public class Kpi { private String remote_addr;// 記錄客戶端的ip地址 private String remote_user;// 記錄客戶端用戶名稱(chēng),忽略屬性"-" private String time_local;// 記錄訪問(wèn)時(shí)間與時(shí)區(qū) private String request;// 記錄請(qǐng)求的url與http協(xié)議 private String status;// 記錄請(qǐng)求狀態(tài);成功是200 private String body_bytes_sent;// 記錄發(fā)送給客戶端文件主體內(nèi)容大小 private String http_referer;// 用來(lái)記錄從那個(gè)頁(yè)面鏈接訪問(wèn)過(guò)來(lái)的 private String http_user_agent;// 記錄客戶瀏覽器的相關(guān)信息 private String method;//請(qǐng)求方法 get post private String http_version; //http版本 public String getMethod() { return method; } public void setMethod(String method) { this.method = method; } public String getHttp_version() { return http_version; } public void setHttp_version(String http_version) { this.http_version = http_version; } public String getRemote_addr() { return remote_addr; } public void setRemote_addr(String remote_addr) { this.remote_addr = remote_addr; } public String getRemote_user() { return remote_user; } public void setRemote_user(String remote_user) { this.remote_user = remote_user; } public String getTime_local() { return time_local; } public void setTime_local(String time_local) { this.time_local = time_local; } public String getRequest() { return request; } public void setRequest(String request) { this.request = request; } public String getStatus() { return status; } public void setStatus(String status) { this.status = status; } public String getBody_bytes_sent() { return body_bytes_sent; } public void setBody_bytes_sent(String body_bytes_sent) { this.body_bytes_sent = body_bytes_sent; } public String getHttp_referer() { return http_referer; } public void setHttp_referer(String http_referer) { this.http_referer = http_referer; } public String getHttp_user_agent() { return http_user_agent; } public void setHttp_user_agent(String http_user_agent) { this.http_user_agent = http_user_agent; } @Override public String toString() { return "Kpi [remote_addr=" + remote_addr + ", remote_user=" + remote_user + ", time_local=" + time_local + ", request=" + request + ", status=" + status + ", body_bytes_sent=" + body_bytes_sent + ", http_referer=" + http_referer + ", http_user_agent=" + http_user_agent + ", method=" + method + ", http_version=" + http_version + "]"; } }
5、kpi的工具類(lèi)
package org.aaa.kpi; public class KpiUtil { /*** * line記錄轉(zhuǎn)化成kpi對(duì)象 * @param line 日志的一條記錄 * @author tianbx * */ public static Kpi transformLineKpi(String line){ String[] elementList = line.split(" "); Kpi kpi = new Kpi(); kpi.setRemote_addr(elementList[0]); kpi.setRemote_user(elementList[1]); kpi.setTime_local(elementList[3].substring(1)); kpi.setMethod(elementList[5].substring(1)); kpi.setRequest(elementList[6]); kpi.setHttp_version(elementList[7]); kpi.setStatus(elementList[8]); kpi.setBody_bytes_sent(elementList[9]); kpi.setHttp_referer(elementList[10]); kpi.setHttp_user_agent(elementList[11] + " " + elementList[12]); return kpi; } }
6、算法模型: 并行算法
Browser: 用戶的訪問(wèn)設(shè)備統(tǒng)計(jì)
– Map: {key:$http_user_agent,value:1}
– Reduce: {key:$http_user_agent,value:求和(sum)}
7、map-reduce分析代碼
import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; import org.hmahout.kpi.entity.Kpi; import org.hmahout.kpi.util.KpiUtil; import cz.mallat.uasparser.UASparser; import cz.mallat.uasparser.UserAgentInfo; public class KpiBrowserSimpleV { public static class KpiBrowserSimpleMapper extends MapReduceBase implements Mapper<Object, Text, Text, IntWritable> { UASparser parser = null; @Override public void map(Object key, Text value, OutputCollector<Text, IntWritable> out, Reporter reporter) throws IOException { Kpi kpi = KpiUtil.transformLineKpi(value.toString()); if(kpi!=null && kpi.getHttP_user_agent_info()!=null){ if(parser==null){ parser = new UASparser(); } UserAgentInfo info = parser.parseBrowserOnly(kpi.getHttP_user_agent_info()); if("unknown".equals(info.getUaName())){ out.collect(new Text(info.getUaName()), new IntWritable(1)); }else{ out.collect(new Text(info.getUaFamily()), new IntWritable(1)); } } } } public static class KpiBrowserSimpleReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>{ @Override public void reduce(Text key, Iterator<IntWritable> value, OutputCollector<Text, IntWritable> out, Reporter reporter) throws IOException { IntWritable sum = new IntWritable(0); while(value.hasNext()){ sum.set(sum.get()+value.next().get()); } out.collect(key, sum); } } public static void main(String[] args) throws IOException { String input = "hdfs://127.0.0.1:9000/user/tianbx/log_kpi/input"; String output ="hdfs://127.0.0.1:9000/user/tianbx/log_kpi/browerSimpleV"; JobConf conf = new JobConf(KpiBrowserSimpleV.class); conf.setJobName("KpiBrowserSimpleV"); String url = "classpath:"; conf.addResource(url+"/hadoop/core-site.xml"); conf.addResource(url+"/hadoop/hdfs-site.xml"); conf.addResource(url+"/hadoop/mapred-site.xml"); conf.setMapOutputKeyClass(Text.class); conf.setMapOutputValueClass(IntWritable.class); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(KpiBrowserSimpleMapper.class); conf.setCombinerClass(KpiBrowserSimpleReducer.class); conf.setReducerClass(KpiBrowserSimpleReducer.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(input)); FileOutputFormat.setOutputPath(conf, new Path(output)); JobClient.runJob(conf); System.exit(0); } }
8、輸出文件log_kpi/browerSimpleV內(nèi)容
AOL Explorer 1
Android Webkit 123
Chrome 4867
CoolNovo 23
Firefox 1700
Google App Engine 5
IE 1521
Jakarta Commons-HttpClient 3
Maxthon 27
Mobile Safari 273
Mozilla 130
Openwave Mobile Browser 2
Opera 2
Pale Moon 1
Python-urllib 4
Safari 246
Sogou Explorer 157
unknown 4685
8 R制作圖片
data<-read.table(file="borwer.txt",header=FALSE,sep=",")
names(data)<-c("borwer","num")
qplot(borwer,num,data=data,geom="bar")
感謝各位的閱讀,以上就是“hadoop網(wǎng)站日志舉例分析”的內(nèi)容了,經(jīng)過(guò)本文的學(xué)習(xí)后,相信大家對(duì)hadoop網(wǎng)站日志舉例分析這一問(wèn)題有了更深刻的體會(huì),具體使用情況還需要大家實(shí)踐驗(yàn)證。這里是創(chuàng)新互聯(lián),小編將為大家推送更多相關(guān)知識(shí)點(diǎn)的文章,歡迎關(guān)注!
網(wǎng)站欄目:hadoop網(wǎng)站日志舉例分析
鏈接URL:http://jinyejixie.com/article38/ppicsp.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供虛擬主機(jī)、企業(yè)建站、全網(wǎng)營(yíng)銷(xiāo)推廣、做網(wǎng)站、定制開(kāi)發(fā)、網(wǎng)站策劃
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來(lái)源: 創(chuàng)新互聯(lián)