用户浏览新闻分析

在公司这个项目中,有幸获得了搜狐提供的一天的用户浏览记录,经过处理以后得到如下面格式的数据:

10000082	12002235,12002254,12002273,12002231,12002229
10000169	12002684
10000170	11990159,11964438,11967826,11962239,11993674,11994700,11988874,11980097,11869151,11989021,11988798
10000197	12003720,12005653,11995420,12005687,12002684,12004640,11922834,11996252,11993003,11992541
10000207	12003312,12003720,11993003,11988798,11989642,11981421,11987264,11992722,11991992,11990421,12003198,11993438,12002683,12003945,12003310,12002358,12007144,11823605,11822852
10000396	11989070
10000406	11991992,11993001,11986248,11993056,12002633,12002684,12003198,12003945
10000472	11984764,11935680,11958279,11885528,11945566
10000497	11988838,11979647,11963203,11975444,11976523,11976446,11974896,11978179,11977404,11994038,11989505,12001199,11992727,11989070,11989719,11989544,11990159,12003107,11963210,12004131,12006906,11980456,11989247,11989089,11988494,11978575,11977213,11976315,11964623,11964802,11988656
10000502	11988798,11988874,11981396

第一列为用户ID,后面为该用户当日浏览的新闻ID。

有了这些数据,最简单的想法是统计一下用户阅读新闻量的分布。

首先我们对用户阅读新闻量按照从高到低排序:

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Arrays;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class GetTopUser {
	
	static Map<String, Integer> usermap = new HashMap<String, Integer>();
	
	static Map.Entry<String, Integer>[] array;
	
	public static void main(String[] args) throws IOException{
		
		
		FileReader fr = new FileReader("F:\\felven\\user_newsid.txt");
	    BufferedReader br = new BufferedReader(fr);
	    
	    File file = new File("F:\\fuser.csv");
	    file.createNewFile();
	    FileWriter fw = new FileWriter("F:\\fuser.csv");
	    BufferedWriter bw = new BufferedWriter(fw);
	    
	    String line=new String();
	    String[] outline=new String[4];
	    String[] newsid=new String[100002];
	    
	    while((line=br.readLine())!=null){
	    	outline=line.split("\t",-1);
	    	newsid=outline[1].split(",",-1);
	    	usermap.put(outline[0], newsid.length);
	    }
	    
	    
	    
	    array=getSortedHashtableByValue(usermap);
	    
	    
	    for(int i=0;i<array.length;i++){
      		 bw.write(array[i].getKey().toString()+","+array[i].getValue().toString());
      	     bw.newLine();
      	     bw.flush();
        }
	    
	    bw.close();
	    fw.close();
	    fr.close();
	    br.close();
		
	}
	
	public static Map.Entry[] getSortedHashtableByValue(Map h) {
        Set set = h.entrySet();
        Map.Entry[] entries = (Map.Entry[]) set.toArray(new Map.Entry[set.size()]);
        Arrays.sort(entries, new Comparator() {
            public int compare(Object arg0, Object arg1) {
                Integer key1 = Integer.valueOf(((Map.Entry) arg0).getValue().toString());
                Integer key2 = Integer.valueOf(((Map.Entry) arg1).getValue().toString());
                return key2.compareTo(key1);
            }
        });
        return entries;
    }

}

得到的输出如下:

5674681035476963338,4664
5674760244232720401,3328
20600187,2457
5674691824099266582,2199
27921511,1439
5687751514672599060,1144
5693365952146575377,901
5688933230787432458,897
5699392045035032598,869
51317069,857
5686681246776692770,853
5687367561260306461,812
53079175,774
5692903716043100164,773
5650687,770
5680990719402053652,767
18011207,757
46119937,723
5676210858208792603,632
16565521,623
5696793758671048720,567
51316830,556
25325106,555
17353008,554
5681349731125563425,554
21697370,552
4121,538
5691749455401848838,532
5687794320136998943,527
5677917920676548610,519
5692398325223919624,515

第一列为用户ID,第二列为该用户当日浏览的新闻数量,下面开始统计:

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;


public class Interval {
	
	public static void main(String[] args) throws IOException{
		FileReader fr = new FileReader("F:\\felven\\fuser.csv");
	    BufferedReader br = new BufferedReader(fr);
	     
	    int count1=0;
	    int count2=0;
	    int count3=0;
	    int count4=0;
	    int count5=0;
	    int count6=0;
	    int count7=0;
	    int count8=0;
	    int count9=0;
	    int count10=0;
	    int count11=0;
	    int sum=0;
	    String line="";
	    String[] outline=new String[3];
	    while((line=br.readLine())!=null){
	    	outline=line.split(",");
	    	sum++;
	    	if(Integer.parseInt(outline[1])>1000){
	    		count1++;
	    	}
	    	else if(Integer.parseInt(outline[1])>900){
	    		count2++;
	    	}
	    	else if(Integer.parseInt(outline[1])>800){
	    		count3++;
	    	}
	    	else if(Integer.parseInt(outline[1])>700){
	    		count4++;
	    	}
	    	else if(Integer.parseInt(outline[1])>600){
	    		count5++;
	    	}
	    	else if(Integer.parseInt(outline[1])>500){
	    		count6++;
	    	}
	    	else if(Integer.parseInt(outline[1])>400){
	    		count7++;
	    	}
	    	else if(Integer.parseInt(outline[1])>300){
	    		count8++;
	    	}
	    	else if(Integer.parseInt(outline[1])>200){
	    		count9++;
	    	}
	    	else if(Integer.parseInt(outline[1])>100){
	    		count10++;
	    	}
	    	else{
	    		count11++;
	    	}
	    }
	    System.out.println("total user is "+sum);
	    System.out.println(">1000 is "+count1+" and percent is "+(double)count1/sum);
	    System.out.println(">900 is "+count2+" and percent is "+(double)count2/sum);
	    System.out.println(">800 is "+count3+" and percent is "+(double)count3/sum);
	    System.out.println(">700 is "+count4+" and percent is "+(double)count4/sum);
	    System.out.println(">600 is "+count5+" and percent is "+(double)count5/sum);
	    System.out.println(">500 is "+count6+" and percent is "+(double)count6/sum);
	    System.out.println(">400 is "+count7+" and percent is "+(double)count7/sum);
	    System.out.println(">300 is "+count8+" and percent is "+(double)count8/sum);
	    System.out.println(">200 is "+count9+" and percent is "+(double)count9/sum);
	    System.out.println(">100 is "+count10+" and percent is "+(double)count10/sum);
	    System.out.println("<100 is "+count11+" and percent is "+(double)count11/sum);
	}
}

得到的输出结果为:

total user is 2334825
>1000 is 6 and percent is 2.5697857441135845E-6
>900 is 1 and percent is 4.2829762401893076E-7
>800 is 5 and percent is 2.1414881200946537E-6
>700 is 6 and percent is 2.5697857441135845E-6
>600 is 2 and percent is 8.565952480378615E-7
>500 is 15 and percent is 6.424464360283961E-6
>400 is 31 and percent is 1.3277226344586854E-5
>300 is 115 and percent is 4.9254226762177035E-5
>200 is 231 and percent is 9.893675114837301E-5
>100 is 1560 and percent is 6.68144293469532E-4
<100 is 2332853 and percent is 0.9991553970854347


可以看到总共有233万用户,99.9%的用户新闻浏览量在100篇(包括100)以内,至于>1000的用户,比如最高的一天看4664篇新闻,完全可以认为这是一个爬虫。

然后我们再对100篇以内的用户进行细分:

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;


public class Interval {
	
	public static void main(String[] args) throws IOException{
		FileReader fr = new FileReader("F:\\felven\\fsmall.csv");
	    BufferedReader br = new BufferedReader(fr);
	    
	     
	    int count1=0;
	    int count2=0;
	    int count3=0;
	    int count4=0;
	    int count5=0;
	    int count6=0;
	    int count7=0;
	    int count8=0;
	    int count9=0;
	    int count10=0;
	    int count11=0;
	    int sum=0;
	    String line="";
	    String[] outline=new String[3];
	    while((line=br.readLine())!=null){
	    	outline=line.split(",");
	    	sum++;
	    	if(Integer.parseInt(outline[1])==100){
	    		count1++;
	    	}
	    	else if(Integer.parseInt(outline[1])>90){
	    		count2++;
	    	}
	    	else if(Integer.parseInt(outline[1])>80){
	    		count3++;
	    	}
	    	else if(Integer.parseInt(outline[1])>70){
	    		count4++;
	    	}
	    	else if(Integer.parseInt(outline[1])>60){
	    		count5++;
	    	}
	    	else if(Integer.parseInt(outline[1])>50){
	    		count6++;
	    	}
	    	else if(Integer.parseInt(outline[1])>40){
	    		count7++;
	    	}
	    	else if(Integer.parseInt(outline[1])>30){
	    		count8++;
	    	}
	    	else if(Integer.parseInt(outline[1])>20){
	    		count9++;
	    	}
	    	else if(Integer.parseInt(outline[1])>10){
	    		count10++;
	    	}
	    	else{
	    		count11++;
	    	}
	    }
	    
	    br.close();
	    fr.close();
	    System.out.println("total user is "+sum);
	    System.out.println("=100 is "+count1+" and percent is "+(double)count1/sum);
	    System.out.println(">90 is "+count2+" and percent is "+(double)count2/sum);
	    System.out.println(">80 is "+count3+" and percent is "+(double)count3/sum);
	    System.out.println(">70 is "+count4+" and percent is "+(double)count4/sum);
	    System.out.println(">60 is "+count5+" and percent is "+(double)count5/sum);
	    System.out.println(">50 is "+count6+" and percent is "+(double)count6/sum);
	    System.out.println(">40 is "+count7+" and percent is "+(double)count7/sum);
	    System.out.println(">30 is "+count8+" and percent is "+(double)count8/sum);
	    System.out.println(">20 is "+count9+" and percent is "+(double)count9/sum);
	    System.out.println(">10 is "+count10+" and percent is "+(double)count10/sum);
	    System.out.println("<=10 is "+count11+" and percent is "+(double)count11/sum);
	}
}

得到的输出如下:

total user is 2332853
=100 is 54 and percent is 2.314762224623669E-5
>90 is 540 and percent is 2.314762224623669E-4
>80 is 915 and percent is 3.9222359917234393E-4
>70 is 1696 and percent is 7.270068024003227E-4
>60 is 2954 and percent is 0.0012662606688033922
>50 is 5822 and percent is 0.0024956566058812963
>40 is 13397 and percent is 0.005742753615422832
>30 is 33956 and percent is 0.014555567796170612
>20 is 101064 and percent is 0.04332206101284564
>10 is 356228 and percent is 0.15270057736171117
<=10 is 1816227 and percent is 0.7785432686928838


这里能够看到77.8%的用户每天阅读新闻量在10篇以内,可以再进一步细分:

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;


public class Interval {
	
	public static void main(String[] args) throws IOException{
		FileReader fr = new FileReader("F:\\felven\\fssmall.csv");
	    BufferedReader br = new BufferedReader(fr);
	    

	     
	    int count1=0;
	    int count2=0;
	    int count3=0;
	    int count4=0;
	    int count5=0;
	    int count6=0;
	    int count7=0;
	    int count8=0;
	    int count9=0;
	    int count10=0;
	    int count11=0;
	    int sum=0;
	    String line="";
	    String[] outline=new String[3];
	    while((line=br.readLine())!=null){
	    	outline=line.split(",");
	    	sum++;
	    	if(Integer.parseInt(outline[1])==10){
	    		count1++;
	    	}
	    	else if(Integer.parseInt(outline[1])==9){
	    		count2++;
	    	}
	    	else if(Integer.parseInt(outline[1])==8){
	    		count3++;
	    	}
	    	else if(Integer.parseInt(outline[1])==7){
	    		count4++;
	    	}
	    	else if(Integer.parseInt(outline[1])==6){
	    		count5++;
	    	}
	    	else if(Integer.parseInt(outline[1])==5){
	    		count6++;
	    	}
	    	else if(Integer.parseInt(outline[1])==4){
	    		count7++;
	    	}
	    	else if(Integer.parseInt(outline[1])==3){
	    		count8++;
	    	}
	    	else if(Integer.parseInt(outline[1])==2){
	    		count9++;
	    	}
	    	else if(Integer.parseInt(outline[1])==1){
	    		count10++;
	    	}
	    	else{
	    		count11++;
	    	}
	    }
	    
	    br.close();
	    fr.close();
	    System.out.println("total user is "+sum);
	    System.out.println("=10 is "+count1+" and percent is "+(double)count1/sum);
	    System.out.println("=9 is "+count2+" and percent is "+(double)count2/sum);
	    System.out.println("=8 is "+count3+" and percent is "+(double)count3/sum);
	    System.out.println("=7 is "+count4+" and percent is "+(double)count4/sum);
	    System.out.println("=6 is "+count5+" and percent is "+(double)count5/sum);
	    System.out.println("=5 is "+count6+" and percent is "+(double)count6/sum);
	    System.out.println("=4 is "+count7+" and percent is "+(double)count7/sum);
	    System.out.println("=3 is "+count8+" and percent is "+(double)count8/sum);
	    System.out.println("=2 is "+count9+" and percent is "+(double)count9/sum);
	    System.out.println("=1 is "+count10+" and percent is "+(double)count10/sum);
	    System.out.println("=0 is "+count11+" and percent is "+(double)count11/sum);
	}
}

得到的结果如下:

total user is 1816227
=10 is 70543 and percent is 0.03884040926602236
=9 is 82179 and percent is 0.04524709741678766
=8 is 95689 and percent is 0.052685594917375414
=7 is 111435 and percent is 0.061355216060547495
=6 is 131828 and percent is 0.07258343808345542
=5 is 157174 and percent is 0.08653874212859956
=4 is 191520 and percent is 0.1054493738943425
=3 is 236860 and percent is 0.13041321376678136
=2 is 299791 and percent is 0.16506251696511504
=1 is 439208 and percent is 0.24182439750097318
=0 is 0 and percent is 0.0


最终我们可以发现大多是用户阅读新闻量在1篇-4篇之间,估计现实差不多也就是这样。



11.20 更新

来到搜狐之后,得到的数据更多,其中有搜狐新闻移动端的用户浏览记录,这里选择了11月2日-11月14日的数据进行分析。

首先是统计每一天有多少用户使用搜狐新闻客户端,前一列是日期,后一列是用户:

1114 6539125
1113 6395081
1112 6668126
1111 6142667
1110 5650577
1109 6259603
1108 6034332
1107 6399206
1106 6372263
1105 6288124
1104 6279249
1103 6238395
1102 5893482

从大体上来说,搜狐新闻客户端日活跃用户量在600万左右,还是比较给力的。

下面针对一天的数据进行分析,以11月14日的数据为例,我们统计出总的点击率(也就是点开新闻查看的次数),然后按照新闻阅读量对用户群进行分段,分别统计出每个区间内的点击率。

total click is 59749234
>1000 is 36 and percent is 0.0443303256406601
>=900 is 8 and percent is 1.28185743770372E-4
>=800 is 9 and percent is 1.2842005639771047E-4
>=700 is 15 and percent is 1.881195665203005E-4
>=600 is 26 and percent is 2.8095757679504307E-4
>=500 is 39 and percent is 3.55067313498948E-4
>=400 is 128 and percent is 9.214678802409417E-4
>=300 is 235 and percent is 0.001360285221397148
>=200 is 811 and percent is 0.003203873709912331
>=100 is 7805 and percent is 0.016746959467296266
<100 is 6530013 and percent is 0.9323563378235108

总的点击率接近6000万次,果然很给力。

和之前一样,能够发现大部分点击率都分布在阅读量不超过100篇的用户群内,于是继续加以分析:

total click is 55734477
=100 is 269 and percent is 4.8264559834301486E-4
>=90 is 12177 and percent is 0.0776632568024277
>=80 is 5312 and percent is 0.00802119305793432
>=70 is 8721 and percent is 0.011583189342567976
>=60 is 16171 and percent is 0.018557274700900128
>=50 is 31013 and percent is 0.0299886370154689
>=40 is 67823 and percent is 0.053355107288438355
>=30 is 159421 and percent is 0.09656979467125887
>=20 is 419290 and percent is 0.17809098666163137
>=10 is 1234776 and percent is 0.300725366096106
<10 is 4584152 and percent is 0.29699618424696084

我们发现阅读量在0-20范围内的用户群点击量最多,接近60%,这些用户可以作为典型用户进行分析,如果要缩小范围,可以只选择10-20区间段内的用户群进行分析即可。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
### 回答1: 好的,以下是一个简单的新闻小程序需求分析: 1. 用户注册与登录:用户可以通过手机号、微信登录等方式进行注册登录,方便用户获取个性化推荐。 2. 浏览新闻用户可以在主页上浏览不同类型的新闻,如国内、国际、娱乐、科技等,也可以通过关键词搜索相关新闻。 3. 新闻推荐:根据用户的历史浏览记录和个人偏好,推荐相关的新闻用户。 4. 收藏和分享:用户可以将自己喜欢的新闻进行收藏和分享,方便用户随时查看和分享给朋友。 5. 评论与点赞:用户可以在新闻下方进行评论和点赞,与其他用户进行互动和交流。 6. 消息通知:用户可以接收到系统推送的新闻更新、评论回复等消息提醒,方便用户及时了解最新事件。 7. 搜索历史记录:用户可以查看自己的搜索历史记录,方便快速找到之前浏览过的新闻。 8. 推送设置:用户可以根据自己的喜好设置推送新闻的频率和类型,保证用户获取到自己感兴趣的新闻。 9. 数据统计与分析:管理员可以通过后台系统对用户的行为数据进行统计和分析,为新闻内容的更新和推荐提供数据支持。 以上为一个简单的新闻小程序需求分析,不同的企业或团队可能会有不同的需求和定制化服务。 ### 回答2: 对于新闻小程序的需求分析,我有以下几点建议。 首先,应当确定用户的需求。通过市场调研和用户调查,了解用户新闻小程序的需求和偏好,例如用户习惯浏览新闻类别、喜欢的功能等,以便针对用户需求进行精确定位。 其次,针对用户需求,确定主要功能。新闻小程序可以包括新闻浏览、分类浏览、关键词搜索、评论互动等功能,根据用户喜好确定主要功能,并确保这些功能的高效实现,提供良好的用户体验。 第三,考虑信息的即时性和准确性。新闻小程序需确保新闻内容及时更新、准确报道,尽量避免虚假信息和误导用户的内容。可以通过与权威媒体合作,引入专业编辑团队进行内容筛选,确保新闻质量。 第四,优化用户界面和交互设计。新闻小程序的界面应简洁清晰,易于导航和浏览,保持良好的用户体验。同时,注重个性化推荐和用户互动,提高用户粘性和参与度。 第五,考虑扩展性和可定制性。新闻小程序需要具备一定的扩展性,以应对未来的发展需求,例如增加新的新闻类别、推出专题报道等。同时,用户也希望能够根据自己的兴趣进行个性化设置,订阅感兴趣的栏目或关键词。 最后,重视数据分析和反馈机制。通过数据分析了解用户行为、兴趣和喜好,及时调整和优化新闻推荐策略和功能设计。同时,建立用户反馈渠道,及时获取用户意见和建议,以不断改进小程序的功能和服务。 综上所述,对于新闻小程序的需求分析,我们应关注用户需求,确定主要功能,保证信息的准确性和及时性,优化用户界面和交互设计,考虑扩展性和可定制性,并注重数据分析用户反馈,以提供优质的用户体验和服务。 ### 回答3: 建议新闻小程序需求分析可以从以下几个方面考虑: 首先,需要明确用户的需求。对于新闻小程序来说,用户最主要的需求就是获取及时、准确、多样化的新闻信息。可以通过用户调研、数据分析等方式来了解用户的偏好和需求,例如用户喜好的新闻类型、频繁访问的板块等。 其次,需要考虑功能需求。新闻小程序的功能可以包括新闻浏览、搜索、评论、分享等。新闻浏览要提供多样的新闻内容,包括文字、图片、视频等形式,同时支持按照关键词、分类等方式进行检索。评论功能可以让用户进行互动和参与,分享功能可以让用户将感兴趣的新闻分享给朋友,扩大小程序的影响范围。 第三,需要考虑用户体验。新闻小程序的界面设计应简洁、清晰,方便用户阅读和操作。可以采用个性化推荐算法,根据用户的偏好推送相应的新闻内容,提高用户体验。同时,也需要考虑小程序的加载速度和稳定性,确保用户可以流畅地使用。 最后,还需要考虑数据分析需求。通过收集和分析用户浏览点击、评论等数据,可以了解用户的行为习惯和兴趣,从而优化新闻内容推荐和运营策略。 综上所述,基于用户需求、功能需求、用户体验和数据分析的角度进行新闻小程序需求分析,可以使小程序更好地满足用户的需求,并提供更好的使用体验。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值