论文总结——部署Android安全更新:涉及制造商、运营商和最终用户的广泛研究

介绍

这是一篇来自CCS2020的文章《Deploying Android Security Updates: an Extensive Study Involving Manufacturers, Carriers, and End Users》,这是一篇纯系统测量的文章,在本文进行广泛的定量测量来衡量Android安全更新和操作系统升级的推出过程。

文章贡献

  • 进行了大规模的定量研究,研究制造商、运营商和终端用户对Android安全更新和操作系统升级的影响。
  • 聚合并关联来自不同数据源的数据,从而对运营商和制造商相关因素以及用户招致的延迟进行新的测量。
  • 评估了Android项目的当前有效性,如Android One和Project Treble,并对最近的Project Mainline提供了一个非常早期的评估。

相关工作

  • 测量了从Qualcomm和Linux软件库到AOSP中代码修改的补丁延迟,并研究了漏洞从发现时间到AOSP补丁的生命周期。
  • 对Android漏洞进行深入分析,研究哪些系统组件会受到影响,以及在AOSP修复代码时,何时会引入漏洞。
  • 研究Android设备的补丁行为,使用直接从参与设备中收集的详细数据。检查了Android源代码树,以确定在上游开源项目中的修复程序包含在Android更新的时间,并将其与从他们的设备分析仪应用程序收集的设备信息相关联。24000个用户设备)来计算手机型号的安全状况指标。
  • 通过一款名为SnoopSnitch 4的移动应用程序分析了终端设备上存在的安全补丁。Android制造商在补丁完整性方面存在很大差异。
  • 测量了当前运行Android Security Bulletin的设备的百分比。
  • 测量了关于移动制造商的Android安全更新模型的生命周期支持、更新的频率,以及从报告漏洞到制造商修补漏洞的时间。
  • 将每个Android安全公告中提到的CVE与厂商发布的特定安全公告中包含的CVE进行了比较,计算了CVE补丁的延迟时间。
  • 其他的研究不局限于Android,研究开源和第三方补丁,以及对用户更新行为的用户研究

前置知识

Treble Device

Android 8.0 版本的一项新元素是 Project Treble。这是 Android 操作系统框架在架构方面的一项重大改变,旨在让制造商以更低的成本更轻松、更快速地将设备更新到新版 Android 系统。Project Treble 适用于搭载 Android 8.0 及后续版本的所有新设备。
Treble 提供了一个稳定的新供应商接口,供设备制造商访问 Android 代码中特定于硬件的部分,这样一来,设备制造商只需更新 Android 操作系统框架,即可跳过芯片制造商直接提供新的 Android 版本。
Treble项目通过引入硬件抽象层(HAL)从根本上改变了软件堆栈,该层将较低级别、特定于设备的供应商实现与Android OS框架分离。这有助于减少制造商和芯片组供应商在更新和升级设备时的开销。

安全更新和操作系统升级流程

更新和升级一个Android设备遵循一个涉及多个实体的命令链。
安全更新从AOSP的Android漏洞开始,由芯片组供应商进行。然后,制造商根据产品是否有定制操作系统版本以及移动运营商的要求进行整合和测试。最后,制造方得到运营商的批注,向最终用户发布更新。
操作系统升级也是类似的过程:首先,谷歌向制造商和芯片组供应商发布平台开发工具包,PDK包含AOSP和类似来源的组件。接下来,设备制造商继续进行开发确保与芯片组的兼容性,满足WIFI和蓝牙标准,添加基本的设备组件,设定厂商定制或标志,并测试所有特性,在此过程中,制造商与运营商和第三方测试实验室合作对每个请求进行测试或额外添加。
这个过程中,制造商和运营商起着很重要的作用,在本文也对这些实体在安全更新和操作系统升级过程产生的开销进行测量。
在这里插入图片描述

Android one

Android One:Android One是一个硬件和软件标准,旨在在参与的手机型号上运行接近库存的Android版本。参与的机型承诺至少两次操作系统升级和三年的每月安全更新。然而,该计划中的手机机型是由谷歌根据具体情况选择的。截至2019年12月10日,Android One网站上列出了23款参与车型。

Project Mainline

它于2019年5月发布,并在Android 10推出的设备上得到支持,允许用户直接通过Google Play更新某些系统组件[4]。Mainline将这些组件作为APK或APEX文件提供,这样它们就可以像应用程序一样更新。这有效地消除了软件推出过程中的设备制造商和运营商。

挑战

  • 一个支持多种设备型号的制造商可能需要对不同的移动运营商进行个别定制。对于制造商来说,这可能导致多达1,500种相同的更新或升级。
  • 由于存在锁定和解锁设备,更新过程变得更加复杂。锁定型号与特定的运营商绑定(即运营商支持)。
  • 大量的运营商和Android设备制造商,加上缺乏标准的报告机制。

数据源

为了收集公共网站上列出的信息,包括Android安全公告和安全更新公告,利用PyQt5、BeautifulSoup和Selenium等工具来检索和解析网页。

Android安全公告

来自AOSP每月发布Android安全公告

运营商数据集

收集美国四大公司公开发布的安全更新公告。

  • AT&T并没有在一个集中的页面上发布安全更新,而是在每个支持的设备型号的单独页面上发布。通过从支持页面遍历每个指定的模型,手动收集每个设备模型的所有url。
  • T-Mobile分别列出每种型号设备的安全更新通知。通过编程收集每个设备模型的URL。
  • Verizon的网站只显示每个型号的三个最新的软件更新,防止用户随着时间的推移收集纵向数据。在一年中三次收集公告数据。
  • Sprint以博客的形式发布安全更新公告,搜索了Sprint的整个Android社区委员会。

最终用户数据集

在谷歌Play Store上下载超过5000万次的美国社交网络上的Android设备上获取了匿名的HTTP访问日志。只使用通过手机用户访问社交网络时收集的数据,并分析来自注册用户的请求。在请求的数据中只使用POST请求内容。

制造商的数据集

具体来说,我们想调查制造商对推出过程的影响。直观上,解锁后的模型应该不会出现相关的延迟,因此分析了解锁和解锁后模型的更新过程。

测量

测量运营商和制造商的影响

推出频率

  • T [^rel] 手机型号发布日期
  • T[^end] 更新结束的时间
  • N[^pot] 两个时间内,模型可能收到的安全更新数目以两个时间戳之间的月数计算
  • N[^act] 观察模型这段时间内收到的实际更新数目
  • N[^act] /N[^pot] 表示为更新频率
    测量的角度:
    1)标准化更新频率和各个型号以及运营商之间的关系
    在这里插入图片描述
    2)模型的年龄在其更新频率和更新支持持续时间之间的关系
    在这里插入图片描述
    3)运营商和制造商之间的相互依存关系对更新频率的影响
    在这里插入图片描述

推出延迟

指的是Android安全公告发布后到运营商/制造商发布更新的天数。
测量角度:
1)每个安卓公告的更新延迟
在这里插入图片描述
2) 所有这些运营商都会导致数周到数月的延迟,各运营商之间差异很大
在这里插入图片描述
3) 机型的年龄影响
4)运营商和制造商关系:根据运营商的不同,同一模型可能会经历不同的延迟量。
在这里插入图片描述
5) Android安全公告中包含的可能影响更新延迟的其他信息。

测试用户更新行为

最终用户设备的安全更新频率和延迟

解析移动应用程序中的用户代理字符串,以获得手机型号变体、运营商、版本号和Android操作系统版本。型号变体是可以映射到特定型号的代码名。此外,还将用户代理字符串中的电话型号变量和运营商信息与运营商数据集进行匹配。
测量角度:
1)终端用户独特的构建编号分布。每台设备,拥有最多设备的前15家制造商。
在这里插入图片描述
2)运营商更新通知和设备上的更新之间的延迟。
在这里插入图片描述
3)设备访问行为和延迟
在这里插入图片描述

终端用户操作系统升级

测量角度:
1)Android10升级延迟。对于2019年升级到Android 10的设备,我们将升级延迟量化为设备上首次观察Android 10与官方Android OS发布日期(2019年9月3日)或相应运营商更新公告日期之间的时间。
2)设备制造商的不同时,操作系统的升级行为。
在这里插入图片描述
3)Treble、Android One和所有活动设备的操作系统发行版
在这里插入图片描述

Android计划有效性

Android one

测量角度:
1)参与Android one用户设备的操作系统

Treble项目

测量角度:
1)查看每台设备的唯一版本号。将Treble设备与所有活动设备进行比较。
在这里插入图片描述
2)Treble重终端用户设备上的Android 10发布日期。每个设备制造商的Android10版本(2019年9月3日)与首次在Treble设备上观察到Android 10的日期之间的时间差。
在这里插入图片描述
3)Treble设备和非Treble设备的更新发布延迟分布
在这里插入图片描述

Project Mainline

测量角度:
1)无法直接测量影响。因此去确定了140720个支持设备,即其型号列在主线支持设备中。这占终端用户数据集中活动设备的11.05%

总结

本文对Android的安全更新进行了广泛的定量研究。我们对制造商、运营商、最终用户的影响以及谷歌领导的项目对安全更新和操作系统升级的有效性进行了评估。本文根据经验量化了问题的规模(例如,推出延迟、频率),并从新的角度提供了额外的见解(例如,locked与unlocked、Treble与non-Treble设备)。

代码

_Append_ALL_Carrier_Data.py代码如下,用来处理所有运营商的信息,将其整合。

import pandas as pd 
import numpy as np

# Clean AT&T
dfatt=pd.read_csv('../data/att_final.csv')
dfatt.drop(columns=['Extra_Details'],inplace=True)
dfatt['Carrier_Release']=pd.to_datetime(dfatt['Carrier_Release'])
dfatt['Security_Level']=pd.to_datetime(dfatt['Security_Level'])
dfatt=dfatt.sort_values(by=['Carrier_Release'])
dfatt['difference']=dfatt['Carrier_Release']-dfatt['Security_Level']
dfatt['difference']=dfatt['difference'].dt.days
dfatt=dfatt.loc[:,['Manufacture','Phone','Carrier_Release','Security_Level','difference','Android_Level_Guessed',"Baseband","Build_Num","Software_Ver"]]
dfatt=dfatt.astype(str)
dfatt['Build']=dfatt["Baseband"]+" "+dfatt["Build_Num"]+" "+dfatt["Software_Ver"]
dfatt.drop(columns=['Baseband',"Build_Num","Software_Ver"], inplace=True)
dfatt.columns=['Manufacture', 'Model', 'Release_Date', 'Bulletin_Level', 'difference','Android_Level_Guessed',"Build"]

# print(dfatt.head())
# print(len(dfatt))


# Clean Verizon
dfver=pd.read_csv('../data/VerizonTotal.csv')#,names=['Manufacture','Model', 'Bulletin_Level', 'Release_Date','Details','Android_Level_Guessed'])
dfver=dfver.astype(str)

dfver.Model=dfver.Model.str.strip() #remove special characters
dfver.Model=dfver.Model.str.replace(r"\xa0"," ") #remove special characters

# Calculate the time difference
dfver['Release_Date']=pd.to_datetime(dfver.Release_Date)
dfver['Bulletin_Level']=pd.to_datetime(dfver.Bulletin_Level)
dfver['difference']=dfver['Release_Date']-dfver['Bulletin_Level']
dfver['difference']=dfver['difference'].dt.days
dfver=dfver.loc[:,["Manufacture","Model","Bulletin_Level","Release_Date","difference","Android_Level_Guessed","Version"]]
# print(dfver.head())
# print(len(dfver))

# Clean Sprint
dfsprnt=pd.read_csv('../data/sprint_final.csv')
dfsprnt=dfsprnt.loc[:,['Manufacture','Model','Release_Date','Bulletin_Level','difference','Description','Android_Level_Guessed','Software_Version']]
dfsprnt=dfsprnt.drop(columns=['Description'])
dfsprnt['Release_Date']=pd.to_datetime(dfsprnt['Release_Date'])
dfsprnt['Bulletin_Level']=pd.to_datetime(dfsprnt['Bulletin_Level'])
dfsprnt['difference']=dfsprnt['Release_Date']-dfsprnt['Bulletin_Level']
dfsprnt['difference']=dfsprnt['difference'].dt.days

dfsprnt.drop_duplicates(inplace=True)
# print(dfsprnt.head())
# print(len(dfsprnt))

#Clean T-Mobile
dftmob=pd.read_csv('../data/tmob_final_CLEANED.csv')#,names=['Release_Date','ENHANCEMENTS','Manufacture', 'Model', 'OS' , 'Android_Level_Guessed','Bulletin_Level', 'difference',],index_col=False)
dftmob.drop(columns=['ENHANCEMENTS','OS_version'],inplace=True)
dftmob=dftmob.loc[:,['Manufacture', 'Model', 'Release_Date' , 'Bulletin_Level' , 'difference','Android_Level_Guessed','VERSION_BUILD']]
dftmob['Release_Date']=pd.to_datetime(dftmob['Release_Date'])
dftmob['Bulletin_Level']=pd.to_datetime(dftmob['Bulletin_Level'])
# print(dftmob.head())
# print(len(dftmob))

### NOTICE: A PROBLEM ARISES FOR PHONE MODEL NAMING AS GALAXY NOTE IS EITHER NOTE {DIGIT} OR NOTE{DIGIT}
# ###################### COMBINE ALL DATA ######################
# drop NaN values
dfatt=dfatt.dropna(subset=['Bulletin_Level'])
dfver=dfver.dropna(subset=['Bulletin_Level'])
print(dfver.columns)

# normalize columns names for future merge
dfatt.columns=['Manufacture', 'Model', 'Release_Date', 'Bulletin_Level','difference','Android_Level_Guessed',"build"]
dfatt['Carrier']="AT&T"

dfver.columns=['Manufacture', 'Model','Bulletin_Level','Release_Date', 'difference','Android_Level_Guessed',"build"]
dfver['Carrier']="Verizon"

dftmob.columns=['Manufacture', 'Model', 'Release_Date', 'Bulletin_Level','difference','Android_Level_Guessed',"build"]
dftmob['Carrier']="TMobile"

dfsprnt.columns=['Manufacture', 'Model', 'Release_Date', 'Bulletin_Level','difference','Android_Level_Guessed',"build"]
dfsprnt['Carrier']="Sprint"


# append all together
finalDF=dfatt
finalDF=finalDF.append(dfver)
finalDF=finalDF.append(dftmob)
finalDF=finalDF.append(dfsprnt)

# shift carrier columns
columns=list(finalDF)
columns.insert(0, columns.pop(columns.index('Carrier')))
finalDF= finalDF.loc[:, columns]

# sort by carrier, manufacture, model, bulletin level
print(len(finalDF))
print(finalDF.head())
#Remove outliers
finalDF.difference=finalDF.difference.astype('double')

# Save the remaining dataframe!!
finalDF.to_csv('../data/allCarrierData_merged.csv',index=False)
# All higher-level bulletins
finalDF['checker']=finalDF.Bulletin_Level.apply(lambda x: str(x)[-2:])
print(finalDF[finalDF.checker!="01"])
# All higher-level bulletins
finalDF['checker']=finalDF.Bulletin_Level.apply(lambda x: str(x)[-2:])
print(finalDF[finalDF.checker!="01"])

_AddReleaseYear.py代码如下:

import pandas as pd
import numpy as np
from warnings import simplefilter

simplefilter(action="ignore",category=FutureWarning)
dfCar=pd.read_csv('../data/allCarrierData_merged.csv')
dfRelease=pd.read_csv('../data/RecentlyReleasedPhones.csv')

dfCar.Manufacture=dfCar.Manufacture.astype('str')
dfCar.Model=dfCar.Model.str.strip()
# First, adjust time of release... only consider August 2015 and later
print(len(dfCar))
dfCar.Release_Date=pd.to_datetime(dfCar.Release_Date)

dfCar=dfCar[dfCar.Release_Date>=pd.to_datetime("2015-08-01")]
dfCar=dfCar[dfCar.Release_Date<pd.to_datetime("2020-01-01")]
print(len(dfCar))
# Adjust phone names as needed!
dfCar.Manufacture=dfCar.Manufacture.str.replace("^Other$","ASUS")
dfCar.Manufacture=dfCar.Manufacture.str.replace("^Blackberry$","BlackBerry")
dfCar.Manufacture=dfCar.Manufacture.str.replace("^Samgung$","Samsung")
dfCar.Manufacture=dfCar.Manufacture.str.replace("^kyocera$","Kyocera")
dfCar.Manufacture=dfCar.Manufacture.str.replace("^lg$","LG")
dfCar.Manufacture=dfCar.Manufacture.str.replace("^palm$","Palm")
# Any galaxy note phones with digits should be note#
dfCar.Model = dfCar.Model.str.strip()
if len(dfCar[dfCar.Model.str.contains("Note ")]) > 0:
    for i in range(10):
        dfCar.Model = dfCar.Model.str.replace("Note " + str(i), "Note" + str(i))

# S6 Edge Plus AND Galaxy S6 Edge + to Galaxy S6 Edge+
if len(dfCar[dfCar.Model.str.contains("S6 Edge Plus")]) > 0:
    for i in range(10):
        dfCar.Model = dfCar.Model.str.replace("S6 Edge Plus", "Galaxy S6 Edge+")
        dfCar.Model = dfCar.Model.str.replace("Galaxy S6 Edge +", "Galaxy S6 Edge+")

# Normalize Motorola phones as needed
if len(dfCar.Model.str.contains("moto ")) > 0:
    dfCar.Model = dfCar.Model.str.replace("moto ", "Moto ")

dfCar.Model = dfCar.Model.str.replace("^Moto Z2 Force Edition", "Moto Z (2) Force")
dfCar.Model = dfCar.Model.str.replace("^Moto z2 force edition", "Moto Z (2) Force")
dfCar.Model = dfCar.Model.str.replace("^moto z2 force edition", "Moto Z (2) Force")
dfCar.Model = dfCar.Model.str.replace("^Motorola Moto Z2 Force Edition$", "Moto Z (2) Force")
dfCar.Model = dfCar.Model.str.replace("^Motorola Moto X$", "Moto X")
dfCar.Model = dfCar.Model.str.replace("^Motorola Nexus 6$", "Nexus 6")
dfCar.Model = dfCar.Model.str.replace("^g6$", "Moto g6")

# Prepend LG as needed
dfCar.Model = dfCar.Model.str.replace(r"^LG G6 \(H871$", "G6")
dfCar.Model = dfCar.Model.str.replace(r"LG  H871S\)$", "G6")
indices = dfCar[dfCar.Model == "G6"].index
dfCar.at[indices, 'Manufacture'] = 'LG'

dfCar.Model = dfCar.Model.str.replace(r"^Google Pixel 4$", "Pixel 4")
indices = dfCar[dfCar.Model == "Pixel 4"].index
dfCar.at[indices, 'Manufacture'] = 'Google'

dfCar.Model = dfCar.Model.str.replace(r"^Google  Pixel 4 XL$", "Pixel 4 XL")
indices = dfCar[dfCar.Model == "Pixel 4 XL"].index
dfCar.at[indices, 'Manufacture'] = 'Google'

dfCar.Model = dfCar.Model.str.replace("^G7 ThinQ", "LG G7 ThinQ")
dfCar.Model = dfCar.Model.str.replace("^LG V20$", "V20")
dfCar.Model = dfCar.Model.str.replace("^LG G2$", "G2")
dfCar.Model = dfCar.Model.str.replace("^LG G3$", "G3")
dfCar.Model = dfCar.Model.str.replace("^LG G4$", "G4")
dfCar.Model = dfCar.Model.str.replace("^LG G5$", "G5")
dfCar.Model = dfCar.Model.str.replace("^LG G6$", "G6")
dfCar.Model = dfCar.Model.str.replace("^G8", "G8 ThinQ")
dfCar.Model = dfCar.Model.str.replace("G8 ThinQ ThinQ", "G8 ThinQ")
dfCar.Model = dfCar.Model.str.replace("^LG K10$", "K10")
dfCar.Model = dfCar.Model.str.replace("^LG K20$", "K20")
dfCar.Model = dfCar.Model.str.replace("LG Phoenix 3", "K20")  # AT&T specific...
dfCar.Model = dfCar.Model.str.replace("LG Phoenix 2", "LG K8")  # AT&T specific...
dfCar.Model = dfCar.Model.str.replace("Stylo 2 PLUS", "LG Stylo 2 Plus")
dfCar.Model = dfCar.Model.str.replace("^Stylo 3$", "LG Stylo 3")
dfCar.Model = dfCar.Model.str.replace("^Stylo 3 Plus$", "LG Stylo 3 Plus")
dfCar.Model = dfCar.Model.str.replace("^Aristo$", "LG Aristo")
dfCar.Model = dfCar.Model.str.replace("^Aristo 2$", "LG Aristo 2")
dfCar.Model = dfCar.Model.str.replace("Aristo 2 PLUS", "LG Aristo 2 PLUS")
dfCar.Model = dfCar.Model.str.replace("Leon LTE", "Leon")
dfCar.Model = dfCar.Model.str.replace("^Q7\+$", "LG Q7+")

# Other phones
dfCar.Model = dfCar.Model.str.replace("^ZTE Maven", "Maven")
dfCar.Model = dfCar.Model.str.replace("^ZTE Maven 2", "Maven 2")
dfCar.Model = dfCar.Model.str.replace("^Nexus 7 2013 LTE", "Google Nexus 7 (2013)")
dfCar.Model = dfCar.Model.str.replace("^J3 Prime", "Galaxy J3 Prime")
dfCar.Model = dfCar.Model.str.replace("^One$", "HTC One")
dfCar.Model = dfCar.Model.str.replace("^One LTE$", "HTC One")
dfCar.Model = dfCar.Model.str.replace("^Galaxy Note10\.1 2014 Edition$", "Galaxy Note 10.1 (2014)")
dfCar.Model = dfCar.Model.str.replace("^Galaxy J7 Star$", "Galaxy J7 (2018)")
dfCar.Model = dfCar.Model.str.replace("^Galaxy J3 Star$", "Galaxy J3 (2018)")
dfCar.Model = dfCar.Model.str.replace("^Galaxy J3 V \(3rd Gen\.\)$", "Galaxy J3 (2018)")
dfCar.Model = dfCar.Model.str.replace("^Galaxy J7 V \(2nd Gen\.\)$", "Galaxy J7 (2018)")
dfCar.Model = dfCar.Model.str.replace("^HTC One M9$", "One M9")
dfCar.Model = dfCar.Model.str.replace(r"^Galaxy Tab A \(8\.0\)$", "Galaxy Tab A 8.0 (2018)")
dfCar.Model = dfCar.Model.str.replace("^Essential Phone$", "Essential PH-1")
dfCar.Model = dfCar.Model.str.replace("^Xperia Z1S 4G LTE$", "Sony Xperia Z1S")
dfCar.Model = dfCar.Model.str.replace("^6T$", "OnePlus 6T")
dfCar.Model = dfCar.Model.str.replace("^Kyocera DuraForce Pro$", "DuraForce Pro")
dfCar.Model = dfCar.Model.str.replace("^Tab A \(2018\)$", "Galaxy Tab A 8.0 (2018)")
dfCar.Model = dfCar.Model.str.replace(r"^Galaxy Tab A \(10\.5\)$", "Galaxy Tab A (2018, 10.5)")
dfCar.Model = dfCar.Model.str.replace("^Galaxy J3 Eclipse$", "Galaxy J3 Emerge")
dfCar.Model = dfCar.Model.str.replace("^Galaxy J3 Prime$", "Galaxy J3 Emerge")
dfCar.Model = dfCar.Model.str.replace("^Galaxy Express Prime 2$", "Galaxy J3 Emerge")
dfCar.Model = dfCar.Model.str.replace("^Galaxy Note10\+ 5G$", "Galaxy Note10+")
dfCar.Model = dfCar.Model.str.replace("^Galaxy Note10 5G$", "Galaxy Note10")
dfCar.Model = dfCar.Model.str.replace("^KEYone$", "BlackBerry KEYone")
dfCar.Model = dfCar.Model.str.replace("^HTC 10$", "10")

indices = dfCar[dfCar.Model.str.contains("Galaxy")].index
dfCar.at[indices, 'Manufacture'] = 'Samsung'

indices = dfCar[dfCar.Manufacture.str.contains("Pointing out that this")].index
dfCar.at[indices, 'Manufacture'] = 'Samsung'

# Fix capitalization
dfCar.Model = dfCar.Model.str.replace("Nexus 5X", "Nexus 5x")
dfCar.Model = dfCar.Model.str.replace("Nexus 6P", "Nexus 6p")
dfCar.Model = dfCar.Model.str.replace("^Moto G6 Play$", "Moto g6 play")
dfCar.Model = dfCar.Model.str.replace("^Moto G7 power$", "Moto g7 power")
dfCar.Model = dfCar.Model.str.replace("^Moto E4$", "Moto e4")
dfCar.Model = dfCar.Model.str.replace("^Moto E4 Plus$", "Moto e4 plus")
dfCar.Model = dfCar.Model.str.replace("^Moto E5 Play$", "Moto e5 play")
dfCar.Model = dfCar.Model.str.replace("^Galaxy S10 5G$", "Galaxy S10")
dfCar.Model = dfCar.Model.str.replace("^Galaxy S7 edge$", "Galaxy S7 Edge")
dfCar.Model = dfCar.Model.str.replace("^Galaxy S6 edge$", "Galaxy S6 Edge")
dfCar.Model = dfCar.Model.str.replace("^Galaxy GRAND Prime$", "Galaxy Grand Prime")
dfCar.Model = dfCar.Model.str.replace("^6$", "Nexus 6")

indices = dfCar[
    dfCar.Model == "Nexus 6"].index  # all Nexus 6 devices should be Motorola (this was designed incoordination with Google)
dfCar.at[indices, 'Manufacture'] = 'Motorola'

indices = dfCar[
    dfCar.Model == "Nexus 6x"].index  # all Nexus 6x devices should be LG (this was designed incoordination with Google)
dfCar.at[indices, 'Manufacture'] = 'LG'

indices = dfCar[
    dfCar.Model == "Nexus 6p"].index  # all Nexus 6 devices should be Huawei (this was designed incoordination with Google)
dfCar.at[indices, 'Manufacture'] = 'Huawei'

# Some patches apply to multiple phones. Must separate these patches as needed...
print("######## Need to split these models into different patches ########")
if len(dfCar.Model.str.contains(" and ")) > 0:
    print(dfCar[dfCar.Model.str.contains(" and ")].Model)

if len(dfCar.Model.str.contains("/")) > 0:
    dfCar.Model = dfCar.Model.str.replace("Phoenix Plus / ", "")  # Remove odd extra
    dfCar_toAppend = dfCar[dfCar.Model.str.contains("/V30+")].copy()
    dfCar.Model = dfCar.Model.str.replace("/V30+", "")
    dfCar_toAppend.Model = dfCar_toAppend.Model.str.replace("V30/", "")
    print("Number of updates for multiple models")
    print(len(dfCar_toAppend))
    print(dfCar_toAppend.Carrier.unique())
    dfCar = dfCar.append(dfCar_toAppend)
    print(dfCar[dfCar.Model.str.contains("/")].Model)

# Fix any remaining issues
dfCar.Model = dfCar.Model.str.replace("( )+", " ")
dfCar.Model = dfCar.Model.str.replace("^e4$", "Moto e4")
dfCar.Model = dfCar.Model.str.replace("^e4 plus$", "Moto e4 plus")
dfCar.Model = dfCar.Model.str.replace("^e4 gold$", "Moto e4 gold")
dfCar.Model = dfCar.Model.str.replace("^e5 go$", "Moto e5 go")
dfCar.Model = dfCar.Model.str.replace("^e5 play$", "Moto e5 play")
dfCar.Model = dfCar.Model.str.replace("^g6 play$", "Moto g6 play")
dfCar.Model = dfCar.Model.str.replace("^g7 power$", "Moto g7 power")
dfCar.Model = dfCar.Model.str.replace("^z2 force edition", "Moto Z (2) Force")
dfCar.Model = dfCar.Model.str.replace("^Moto z2 force", "Moto Z (2) Force")
dfCar.Model = dfCar.Model.str.replace("^z3", "Moto z3")
dfCar.Model = dfCar.Model.str.replace("^z4", "Moto z4")
dfCar.Model = dfCar.Model.str.replace("^2 V", "Nokia 2 V")
dfCar.Model = dfCar.Model.str.replace("^PRO 2", "DuraForce PRO 2")
dfCar.Model = dfCar.Model.str.replace("^PRO with Sapphire Shield", "DuraForce PRO with Sapphire Shield")
dfCar.Model = dfCar.Model.str.replace("^Kyocera DuraForce PRO 2", "DuraForce PRO 2")
dfCar.Model = dfCar.Model.str.replace("^Kyocera DuraForce PRO with Sapphire Shield",
                                      "DuraForce PRO with Sapphire Shield")
dfCar.Model = dfCar.Model.str.replace("^Stylo 4\+", "LG Stylo 4+")
dfCar.Model = dfCar.Model.str.replace("^Stylo 5", "LG Stylo 5")
dfCar.Model = dfCar.Model.str.replace("^Stylo 2", "LG Stylo 2")
dfCar.Model = dfCar.Model.str.replace("^Stylo 4", "LG Stylo 4")
dfCar.Model = dfCar.Model.str.replace("^Stylo 2 V", "LG Stylo 2 V")
dfCar.Model = dfCar.Model.str.replace("^Galaxy Tab E \(8.0\)", "Galaxy Tab E 8.0")
dfCar.Model = dfCar.Model.str.replace("^Galaxy Tab E$", "Galaxy Tab E 8.0")
dfCar.Model = dfCar.Model.str.replace(r"^Galaxy S6 edge [+]+", "Galaxy S6 Edge+")
dfCar.Model = dfCar.Model.str.replace(r"^Galaxy S6 Edge[+]{2}", "Galaxy S6 Edge+")
dfCar.Model = dfCar.Model.str.replace("^z2 play", "Moto z2 play")
dfCar.Model = dfCar.Model.str.replace("^Motorola Moto e5 play", "Moto e5 play")
dfCar.Model = dfCar.Model.str.replace("^Motorola Moto G6 Play", "Moto G6 Play")
dfCar.Model = dfCar.Model.str.replace("^Palm Palm", "Palm")
dfCar.Model = dfCar.Model.str.replace("^Galaxy S 4", "Galaxy S4")
dfCar.Model = dfCar.Model.str.replace("^Galaxy S 5", "Galaxy S5")
dfCar.Model = dfCar.Model.str.replace("^Galaxy S 4 mini", "Galaxy S4 mini")
dfCar.Model = dfCar.Model.str.replace("^E5 Play", "Moto e5 play")
dfCar.Model = dfCar.Model.str.replace(r"^HTC One \(M8\)$", "One M8")
dfCar.Model = dfCar.Model.str.replace("^One \(M8\)$", "One M8")
dfCar.Model = dfCar.Model.str.replace("^DROID MINI", "Droid Mini")
dfCar.Model = dfCar.Model.str.replace("^Droid TURBO", "Droid Turbo")
dfCar.Model = dfCar.Model.str.replace("^DROID TURBO 2", "Droid Turbo 2")
dfCar.Model = dfCar.Model.str.replace("^Phone", "Palm")
dfCar.Model = dfCar.Model.str.replace("^E 4th Generation", "Moto E 4th Generation")
dfCar.Model = dfCar.Model.str.replace(" \(J727A\)", "")
dfCar.Model = dfCar.Model.str.replace('Galaxy Note\x01', "Galaxy Note")
dfCar.Model = dfCar.Model.str.replace("^Z3 play", "Moto Z3 play")
dfCar.Model = dfCar.Model.str.replace("^Z3 Play", "Moto Z3 play")
dfCar.Model = dfCar.Model.str.replace("^Z2 Force", "Moto Z (2) Force")
dfCar.Model = dfCar.Model.str.replace("^G7 Play", "Moto G7 play")
dfCar.Model = dfCar.Model.str.replace("^HTC One A9", "One A9")
dfCar.Model = dfCar.Model.str.replace("^LG G8X ThinQ", "G8 ThinQ")
dfCar.Model = dfCar.Model.str.replace("^e6$", "Moto e6")
dfCar.Model = dfCar.Model.str.replace("^LG V30$", "V30")
dfCar.Model = dfCar.Model.str.replace("^G6$", "LG G6")
dfCar.Model = dfCar.Model.str.replace("^GizmoTablet$", "Gizmo Tablet")
dfCar.Model = dfCar.Model.str.replace("^LG G8$", "LG G8 ThinQ")

dfCar.Manufacture = dfCar.Manufacture.str.replace("^DuraForce$", "Kyocera")

dfCar.Model = dfCar.Model.str.replace("^HYDROGEN ONE$", "RED Hydrogen One")
dfCar.Model = dfCar.Model.str.replace("^Orbic Wonder$", "Wonder")
dfCar.Model = dfCar.Model.str.replace("^Tab A 8\.0$", "Galaxy Tab A 8.0 (2018)")
dfCar.Model = dfCar.Model.str.replace("^HTC Desire 626$", "Desire 626")
dfCar.Model = dfCar.Model.str.replace(r"^Slate 8$", "Slate 8 Tablet")
dfCar.Model = dfCar.Model.str.replace(r"^Moto Z2 Force", "Moto Z (2) Force")
dfCar.Model = dfCar.Model.str.replace(r"^GS7 Edge Special Edition$", "Galaxy S7 Edge Special Edition")
dfCar.Model = dfCar.Model.str.replace(r"^Galaxy S7 Edge special edition$", "Galaxy S7 Edge Special Edition")
dfCar.Model = dfCar.Model.str.replace(r"^Galaxy Express Prime 3$", "Galaxy Express Prime 3")
dfCar.Model = dfCar.Model.str.replace(r"^Moto G Play$", "Moto G4 Play")
dfCar.Model = dfCar.Model.str.replace(r"^Tab A 8\.0$", "Galaxy Tab A 8.0 (2018)")
dfCar.Model = dfCar.Model.str.replace(r"^Galaxy Tab A 8\.0$", "Galaxy Tab A 8.0 (2018)")
dfCar.Model = dfCar.Model.str.replace(r"^LG G6$", "G6")
dfCar.Model = dfCar.Model.str.replace(r"^Alcatel IDEAL$", "Alcatel OneTouch IDEAL")
dfCar.Model = dfCar.Model.str.replace(r"Moto z2 play^$", "Moto Z (2) Play")
dfCar.Model = dfCar.Model.str.replace(r"^LG G8 ThinQ$", "G8 ThinQ")
dfCar.Model = dfCar.Model.str.replace(r"^8 Tablet$", "Slate 8 Tablet")
dfCar.Model = dfCar.Model.str.replace(r"^Moto z2 play$", "Moto Z (2) play")
dfCar.Model = dfCar.Model.str.replace(r"^G6 Play$", "moto g6 play")
dfCar.Model = dfCar.Model.str.replace(r"^E5 Plus$", "moto e5 plus")
dfCar.Model = dfCar.Model.str.replace(r"^E4 Plus$", "moto e4 plus")

if len(dfCar.Model.str.contains(" and ")) > 0:
    toAppend = dfCar[dfCar.Model.str.contains(" and ")]
    print("Total number of updates for multiple models")
    print(len(toAppend))
    print(toAppend.Carrier.unique())
    toAppend.reset_index(drop=False, inplace=False)
    toAppend.model = "Alcatel CAMEOX"
    dfCar.append(toAppend)
    dfCar.reset_index(drop=False, inplace=False)
    dfCar.Model = dfCar.Model.str.replace(r"^Alcatel IdealXCITE and CAMEOX$", "Alcatel IdealXCITE")
    # print(dfCar[dfCar.Model.str.contains("Alcatel")].Model)


dfCar.Model = dfCar.Model.str.strip()
dfCar.Manufacture = dfCar.Manufacture.str.strip()
######### Start Appending Release Years #########
dfCar['modelMerge']=dfCar.Model.str.strip()
dfCar['modelMerge']=dfCar.modelMerge.str.rstrip(' ')
dfCar['modelMerge']=dfCar.modelMerge.str.lower()
print(dfCar.head())

print("--------------------")
print(dfRelease.columns)
dfRelease.drop(columns=['Released OS','Current Supported OS','Reference Source','Android One'],inplace=True)
dfRelease.columns=['Manufacture','Model','Phone_Release']
dfRelease['Phone_Release']=dfRelease['Phone_Release'].str.replace(r'^([0-9]{4})/([0-9])$',r"\1/0\2") # normalize dates
dfRelease['modelMerge']=dfRelease.Model.str.lstrip(' ')
dfRelease['modelMerge']=dfRelease.modelMerge.str.rstrip(' ')
dfRelease['modelMerge']=dfRelease.modelMerge.str.lower()
dfRelease.drop(columns=['Manufacture','Model'],inplace=True)
print(dfRelease.head())
test=pd.merge(dfCar, dfRelease, on=['modelMerge'],how='left')
test.Phone_Release=test.Phone_Release.astype('str')
test['Year_Phone_Release']=test.Phone_Release.copy()
test['Year_Phone_Release']=test.Year_Phone_Release.apply(lambda x: x[:4])
print(test.head())
print(len(dfCar))
print(len(test))
print(len(dfCar)-len(test))
print(len(test)/len(dfCar))
noGood=[]
for index,value in test.iterrows():
    if value.Phone_Release=='nan':
        noGood.append(value.values)
        #print(value.values)
noGoodDF=pd.DataFrame(noGood,columns=['Carrier','Manufacture','Model','Release_Date','Security_Patch','difference','Android_Level_Guessed', 'build','modelMerge','Phone_Release','Year_Phone_Release'])
uniqPhone=noGoodDF.modelMerge.unique()
print(len(uniqPhone))
print(noGoodDF.head())
noGoodCounts = noGoodDF.groupby(['Manufacture','Model'])['Release_Date'].nunique()
noGoodCounts=noGoodCounts.sort_values(ascending=False)

print("Unique phones: "+str(len(noGoodCounts)))
# print(noGoodCounts[noGoodCounts==7]) #.iloc[48:-10]
# print(noGoodCounts)


for index,value in noGoodCounts.iteritems():
    print(str(index[0])+";"+str(index[1])+";"+str(value))

# print("-----------------------------------")
# print(test[test.Manufacture.str.contains("Nexus")])
#print(test[test.Manufacture=="HTC"])

# print(test[test.Model=="6"])
# print(test[test.Model.str.contains("2nd")])#.Model.unique())



# ASSUMPTION
    # Stylo 4 is LG Q Stylo 4
    # Moto X is 1st generation
    # LG Aristo: https://pc-tablet.com/lg-aristo-price-specs-features-and-release-date-in-the-us/

# print(test[test.Model=="6T"])
###### Result to save
finalDF=pd.merge(dfCar, dfRelease, on=['modelMerge'],how='inner')
finalDF.Phone_Release=finalDF.Phone_Release.astype('str')
finalDF['Year_Phone_Release']=finalDF.Phone_Release.copy()
finalDF['Year_Phone_Release']=finalDF.Year_Phone_Release.apply(lambda x: x[:4])

finalDF['Month_Phone_Release']=finalDF.Phone_Release.copy()
finalDF['Month_Phone_Release']=finalDF.Month_Phone_Release.apply(lambda x: x[-2:])
print(len(finalDF))
print(len(finalDF)/len(dfCar)*100)
print(finalDF.head())
finalDF.sort_values(by=['Carrier','Manufacture','Model','Release_Date'],inplace=True)
finalDF.drop(columns=['modelMerge','Phone_Release'],inplace=True)



#dfCar=pd.read_csv('../Carrier_Latency/allCarrierData_cleaned.csv')
print("Original File: ")
print(len(dfCar))
print()
print("------------------------------------------------")
print("Cleaned File: ")
print(len(finalDF))
print()
print("AT&T")

print("Sprint")

print("Tmobile")

print("Verizon")


print("Percentage of Usable Updates: ")
percent=(len(finalDF)/len(dfCar))*100
print(percent)
print(len(finalDF))
print(finalDF.Carrier.unique())
print("AT&T")
print(len(finalDF[finalDF.Carrier=='AT&T']))
print("Sprint")
print(len(finalDF[finalDF.Carrier=='Sprint']))
print("Tmobile")
print(len(finalDF[finalDF.Carrier=='TMobile']))
print("Verizon")
print(len(finalDF[finalDF.Carrier=='Verizon']))
finalDF.to_csv('../data/allCarrierData_merged_years.csv',index=False)
print(len(dfCar.Model.unique()))
print(len(dfCar.Manufacture.unique()))
print("--------------------------------")
print(len(finalDF.Model.unique()))
print(len(finalDF.Manufacture.unique()))

# print(dfCar[dfCar.Manufacture=="Pointing out that this OS update is specific to the S7 Special Edition only. Samsung"])
print(dfCar.Manufacture.unique())


# Look at phone release vs update release. Any updates reported prior?

# print(finalDF.head())


def monthAdd(mon):
    if mon == 12:
        return "01"
    elif mon >= 9:
        return str(mon + 1)
    else:
        return "0" + str(mon + 1)


finalDF['Month_Phone_Release_add'] = finalDF['Month_Phone_Release'].apply(lambda x: monthAdd(int(x)))

finalDF['PhoneRelease'] = finalDF['Year_Phone_Release'].astype(str) + "-" + finalDF['Month_Phone_Release_add'].astype(
    str) + "-01"
finalDF['PhoneRelease'] = pd.to_datetime(finalDF['PhoneRelease'])
finalDF['Release_Date'] = pd.to_datetime(finalDF['Release_Date'])

print()

hm = finalDF[finalDF['Release_Date'] < finalDF['PhoneRelease']]

print(len(hm))
print(hm.Model.unique())

for mod in hm.Model.unique():
    tmps = hm[hm.Model == mod]
    tmps = tmps.loc[:, ['Carrier', 'Manufacture', 'Model', 'Release_Date', 'PhoneRelease']]
    print(len(tmps))
    print(tmps.head())

    print("--------------------------------------------------")
    print("--------------------------------------------------")

_Carrier_normalize_and_timeFilter.py的代码如下,用于进行标准化。

#Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 

carrierDF=pd.read_csv('../data/allCarrierData_merged_years.csv')
print(len(carrierDF))
# Drop duplicates
carrierDF.drop_duplicates(keep='first', inplace=True ) # drop whole duplicate rows
#carrierDF[carrierDF.duplicated(keep=False)]

carrierDF['checker']=carrierDF.Bulletin_Level.apply(lambda x: str(x)[:10])
carrierDF['checker']=carrierDF.checker.apply(lambda x: str(x)[-2:])
print(carrierDF[carrierDF.checker!="01"])
######## Filter down to approprate timeline ########
print(len(carrierDF))

carrierDF.Bulletin_Level=carrierDF.Bulletin_Level.astype(str)
carrierDF.Bulletin_Level=carrierDF.Bulletin_Level.str.replace(" [0:]*","")
carrierDF.Bulletin_Level=carrierDF.Bulletin_Level.apply(lambda x: str(x[:-2])+"01") # convert all bulletins to first day

# setting the bad bulletins
#badBulletins=['2019-09-01', '2019-08-01', '2019-07-01']

# for badBul in badBulletins: # filtering...
#     carrierDF=carrierDF[carrierDF.Bulletin_Level!=badBul]
print(len(carrierDF))
carrierDF[carrierDF.duplicated(subset=['Carrier','Manufacture','Model','Bulletin_Level'],keep=False)]
# Remove duplicate carrier, manufacture, phone, bulletin pairings. Keep the earliest release date
carrierDF=carrierDF.sort_values(by=['Carrier','Manufacture','Model','Bulletin_Level','difference'])# Sort to properly drop duplicates if bulletins were normalized to the same (e.g., level 1 and 5 were both patched thus keep the first date)
carrierDF.drop_duplicates(subset=['Carrier','Manufacture','Model','Bulletin_Level'],keep='first', inplace=True ) # drop the duplicates and keep the lowest difference
print(len(carrierDF))

for car in carrierDF.Carrier.unique():
    print(car)
    print(len(carrierDF[carrierDF.Carrier==car]))
# errors in parsing verizon...
carrierDF=carrierDF.sort_values(by=['Carrier','Manufacture','Model','Release_Date','Bulletin_Level'])
carrierDF.drop_duplicates(subset=['Carrier','Manufacture','Model','Release_Date','Bulletin_Level'],keep='first', inplace=True ) 

print(len(carrierDF))
# Recalculate difference!
carrierDF.Bulletin_Level=pd.to_datetime(carrierDF.Bulletin_Level)
carrierDF.Release_Date=pd.to_datetime(carrierDF.Release_Date)

newDif=carrierDF.Release_Date-carrierDF.Bulletin_Level
newDif=newDif.apply(lambda x: x.days)
carrierDF.difference=newDif.copy()
print(carrierDF.head())
# Remove outliers!!
index=carrierDF[((carrierDF.Model=='Galaxy A6') & (carrierDF.difference==-312))].index
carrierDF.at[index,'Release_Date']=pd.to_datetime("2019-05-24")

index=carrierDF[((carrierDF.Model=='Galaxy Tab A 8.0 (2018)') & (carrierDF.difference==-71))].index
carrierDF.at[index,'Release_Date']=pd.to_datetime("2019-06-22")

#The following is a use case from using the assumption of the "same" year eventhough Jan starts a new year
index=carrierDF[((carrierDF.Model=='Galaxy Note9') & (carrierDF.difference==-329))].index
carrierDF.at[index,'Bulletin_Level']=pd.to_datetime("2018-12-01")


# Recalculate readjusted outliers
carrierDF.Bulletin_Level=pd.to_datetime(carrierDF.Bulletin_Level)
carrierDF.Release_Date=pd.to_datetime(carrierDF.Release_Date)

newDif=carrierDF.Release_Date-carrierDF.Bulletin_Level
newDif=newDif.apply(lambda x: x.days)
carrierDF.difference=newDif.copy()

carrierDF=carrierDF[carrierDF.difference>=-10]
carrierDF=carrierDF[carrierDF.difference<=150]


# Now change T-Mobile manufacturer to Revvl (T-Mobile)
carrierDF.Manufacture=carrierDF.Manufacture.str.replace("T-Mobile","Revvl(T-Mobile)")
carrierDF.Carrier=carrierDF.Carrier.str.replace("TMobile","T-Mobile")
print(len(carrierDF))
print(len(carrierDF.Model.unique()))
print(len(carrierDF.Manufacture.unique()))
print()
for car in carrierDF.Carrier.unique():
    print(car)
    print(len(carrierDF[carrierDF.Carrier==car]))
carrierDF.sort_values(by=['Carrier','Manufacture','Model','Release_Date'],inplace=True)
carrierDF.to_csv('../data/allCarrierData_final.csv',index=False)
print(carrierDF.head())
print(carrierDF.Manufacture.unique())

_Carrier_Data_Statistics.py代码如下:

allCarriers=carrierDF.Carrier.unique()
allManufactures=carrierDF.Manufacture.unique()
allModels=carrierDF.Model.unique()
allBulletin=carrierDF.Bulletin_Level.unique()
allReleaseYears=carrierDF.Year_Phone_Release.unique()

print("All Updates: " + str(len(carrierDF)))
print("Unique Models: " + str(len(allModels)))
print("Unique Manufacture: " + str(len(allManufactures))) 
print("Unique Bulletins: " + str(len(allBulletin))) 

tmp=carrierDF.copy()
tmp.sort_values(by=['Year_Phone_Release','Month_Phone_Release'],inplace=True)
print("Earliest Released Model: " + str(tmp.iloc[1,2]) + " on "+ str(tmp.iloc[1,6])+"-" + str(tmp.iloc[1,7]))
tmp.sort_values(by=['Year_Phone_Release','Month_Phone_Release'],inplace=True,ascending=False)
print("Latest Released Model: " + str(tmp.iloc[1,2]) + " on "+ str(tmp.iloc[1,6])+"-" + str(tmp.iloc[1,7]))

tmp.sort_values(by=['Release_Date','Bulletin_Level'],inplace=True)
print("Earliest Released Patch: " + str(tmp.iloc[1,2]) + " on "+ str(tmp.iloc[1,3]))

tmp.sort_values(by=['Release_Date','Bulletin_Level'],inplace=True,ascending=False)
print("Latest Released Patch: " + str(tmp.iloc[1,2]) + " on "+ str(tmp.iloc[1,3]))
############### Total Number of Updates ###############
allRows=[]
length=0
for carrier in allCarriers:
    length=len(carrierDF[carrierDF.Carrier==carrier])
    allRows.append([str(carrier),length])
    
statsDF=pd.DataFrame(allRows,columns=['Carrier','Num_Updates'])
############## Stats for Carrier ###############
# Unique Number of Manufactures
newCol=[]
length=0
for carrier in allCarriers:
    dfTemp=carrierDF[carrierDF.Carrier==carrier]
    length=len(dfTemp.Manufacture.unique())
    newCol.append(length)

statsDF['Unique_Manufacture']=newCol.copy()

# Unique Number of Phone Models
newCol=[]
length=0
for carrier in allCarriers:
    dfTemp=carrierDF[carrierDF.Carrier==carrier]
    length=len(dfTemp.Model.unique())
    newCol.append(length)

statsDF['Unique_Models']=newCol.copy()

# Unique Number of Bulletins
newCol=[]
length=0
for carrier in allCarriers:
    dfTemp=carrierDF[carrierDF.Carrier==carrier]
    length=len(dfTemp.Bulletin_Level.unique())
    newCol.append(length)

statsDF['Unique_Bulletins']=newCol.copy()

# Timeline: Earliest Release date
newCol=[]
length=0
for carrier in allCarriers:
    tempDF=carrierDF[carrierDF.Carrier==carrier]
    minDifIndex=np.argmin(tempDF.Release_Date.values)
    newCol.append(tempDF.iloc[minDifIndex].Release_Date)
    
statsDF['Earliest_Release']=newCol.copy()

# Timeline: Latest Release date
newCol=[]
length=0
for carrier in allCarriers:
    tempDF=carrierDF[carrierDF.Carrier==carrier]
    minDifIndex=np.argmax(tempDF.Release_Date.values)
    newCol.append(tempDF.iloc[minDifIndex].Release_Date)
    
statsDF['Latest_Release']=newCol.copy()


# Timeline: Earliest Bulletin date
newCol=[]
length=0
for carrier in allCarriers:
    tempDF=carrierDF[carrierDF.Carrier==carrier]
    minDifIndex=np.argmin(tempDF.Bulletin_Level.values)
    newCol.append(tempDF.iloc[minDifIndex].Bulletin_Level)
    
statsDF['Earliest_Bulletin']=newCol.copy()

# Timeline: Latest Bulletin date
newCol=[]
length=0
for carrier in allCarriers:
    tempDF=carrierDF[carrierDF.Carrier==carrier]
    minDifIndex=np.argmax(tempDF.Bulletin_Level.values)
    newCol.append(tempDF.iloc[minDifIndex].Bulletin_Level)
    
statsDF['Latest_Bulletin']=newCol.copy()


# Average bulletins per manufacture
newCol=[]
length=0
for carrier in allCarriers:
    tempDF=statsDF[statsDF.Carrier==carrier]
    calc=tempDF.Unique_Bulletins/tempDF.Unique_Manufacture
    newCol.append(calc.iloc[0])

statsDF['Bulletins_per_Manufacture']=newCol.copy()
    
# Average bulletins per model
newCol=[]
length=0
for carrier in allCarriers:
    tempDF=statsDF[statsDF.Carrier==carrier]
    calc=tempDF.Unique_Bulletins/tempDF.Unique_Models
    newCol.append(calc.iloc[0])

statsDF['Bulletins_per_Model']=newCol.copy()
print(statsDF.head())
# Timeline: Latest Bulletin date
tmp=carrierDF
tmp.sort_values(by=['Year_Phone_Release','Month_Phone_Release'],inplace=True,ascending=True)

newCol=[]
length=0
for carrier in allCarriers:
    tempDF=tmp[tmp.Carrier==carrier]
    minDifIndex=np.argmin(tempDF.Year_Phone_Release.values)
    newCol.append(tempDF.iloc[minDifIndex].Year_Phone_Release)
    print(carrier)
    print(tempDF.iloc[minDifIndex])
    print()
    
print("----------------------------------------------")
tmp.sort_values(by=['Year_Phone_Release','Month_Phone_Release'],inplace=True,ascending=False)    
newCol=[]
length=0
for carrier in allCarriers:
    tempDF=tmp[tmp.Carrier==carrier]
    maxDifIndex=np.argmax(tempDF.Year_Phone_Release.values)
    newCol.append(tempDF.iloc[maxDifIndex].Year_Phone_Release)
    print(carrier)
    print(tempDF.iloc[maxDifIndex])
    print()
print(carrierDF.columns.tolist())
# Graph for each carrier and latency from lowest to highest
import plotly as py
import plotly.graph_objs as go
py.offline.init_notebook_mode(connected=True)

for carrier in carrierDF.Carrier.unique():
    print("-------------------------------------")
    print(carrier)
    data = []
    temp=carrierDF[carrierDF.Carrier==carrier]
    temp.difference=temp.difference.astype(float)
    temp.sort_values(by=['difference','Manufacture','Model'],inplace=True)
    
    difCounts = temp.groupby('difference')['difference'].count()
    difCounts=difCounts.sort_index(ascending=True,axis='index')
    #print(difCounts)
    
    
    data.append(go.Bar(
        x=difCounts.index,
        y=difCounts,
        name=str(carrier),
        #marker=dict(color='rgb'+str(lvlColor[car]))
    )) 
    
    # Edit the layout
    layout = dict(title = "Update Latency across Carriers",
                  yaxis = dict(title = 'Number of Days',showgrid=True, gridcolor='rgb(219, 219, 219)'),
                  plot_bgcolor='rgba(0,0,0,0)',
                  )

    fig = dict(data=data, layout=layout)
    py.offline.iplot(fig)
    # for each manufacture, add 
distTest=carrierDF
distTest = distTest.groupby(['Manufacture'])['Model'].nunique()
distTest=distTest.sort_values(ascending=False)


itemsX=[] # number of unique phone models as x axis
itemsY=[] # number of manufactures as y axis
for i in distTest.unique():
    temp=distTest[distTest==i]
    itemsX.append(i)
    itemsY.append(len(temp))
    
print(itemsX)
print(itemsY)

data=[]
data.append(go.Bar(
        x=itemsX,
        y=itemsY,
        #marker=dict(color='rgb'+str(lvlColor[car]))
    )) 
    
# Edit the layout
layout = dict(#title = "Update Latency across Carriers",
              yaxis = dict(title = 'Number of Manufacturers',showgrid=True, gridcolor='rgb(219, 219, 219)'),
              xaxis = dict(title = 'Number of Unique Models'),
              plot_bgcolor='rgba(0,0,0,0)',
              )

fig = dict(data=data, layout=layout)
py.offline.iplot(fig)
print(len(carrierDF.Manufacture.unique()))
print(len(carrierDF.Model.unique()))
# How many devices updated in 2019?
Bul2019Only=carrierDF[carrierDF.Bulletin_Level.str.contains("2019")]
print(len(Bul2019Only.Model.unique()))
print(Bul2019Only.Model.unique())


#PIE: from August 6, 2018 and later
#ANDROID 10: September 3, 2019 and later


print(carrierDF[carrierDF.Manufacture.str.contains("Essential")]) 
print(carrierDF.Manufacture.unique())   

_Carrier_Number_Updates代码如下:

import pandas as pd
import numpy as np
import researchpy # pip install researchpy

import plotly as py
import plotly.graph_objs as go
py.offline.init_notebook_mode(connected=True)

carrierDF=pd.read_csv('../data/allCarrierData_final.csv')
carrierDF['Year_Patch_Release']=carrierDF.Release_Date.copy()
carrierDF['Year_Patch_Release']=carrierDF.Year_Patch_Release.apply(lambda x: x[:4])
print(carrierDF.head())
#### Variables
allCarriers=carrierDF.Carrier.unique()
allManufactures=carrierDF.Manufacture.unique()
allManUpdates=[]
sortedManufactures=[]
data=[]
for manufacture in allManufactures:
    print(manufacture)
    allManUpdates.append(len(carrierDF[carrierDF.Manufacture==manufacture]))
    sortedManufactures.append(manufacture)


sortedManufactures = [x for _,x in sorted(zip(allManUpdates,sortedManufactures),reverse=True)]
allManUpdates = sorted(allManUpdates,reverse=True)

print(sortedManufactures)
print(allManUpdates)

for index,manufacture in enumerate(sortedManufactures):
    data.append(go.Bar(
        x=[str(manufacture)],
        y=[allManUpdates[index]],
        name=str(manufacture)
    ))

    
    
# Edit the layout
layout = dict(title = "Number of Updates across Manufactures",
              yaxis = dict(title = 'Number of Reported Updates',showgrid=True, gridcolor='rgb(219, 219, 219)'),
              xaxis = dict(title = 'Manufacture'),
              plot_bgcolor='rgba(0,0,0,0)',
              )

fig = dict(data=data, layout=layout)
py.offline.iplot(fig, filename='avg-latency-since-release-years.png')
    # For each carrier
    # Count out of all the updates, how many per manufacture
    # Graph as pie
labels = sortedManufactures
values = allManUpdates

fig = go.Figure(data=[go.Pie(labels=labels, values=values)])
fig.show()
from plotly.subplots import make_subplots

labels = sortedManufactures

# Create subplots: use 'domain' type for Pie subplot
fig = make_subplots(rows=2, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}], [{'type':'domain'}, {'type':'domain'}]])


counts=carrierDF.groupby(['Carrier', 'Manufacture']).size().reset_index(name='counts')
formatting=[1,1]
for carrier in carrierDF.Carrier.unique():
    countsFiltered=counts[counts.Carrier==carrier]
    values=[]
    for manufacture in labels:
        newVal=countsFiltered[countsFiltered.Manufacture==manufacture]
        if len(newVal.counts.values)>0:
            values.append(newVal.counts.values[0])
        else:
            values.append(0)
    fig.add_trace(go.Pie(labels=labels, values=values.copy(), name=str(carrier)), formatting[0], formatting[1])
    if formatting==[1,1]:
        formatting=[1,2]
    elif formatting==[1,2]:
        formatting=[2,1]
    elif formatting==[2,1]:
        formatting=[2,2]
        

# # Use `hole` to create a donut-like pie chart
fig.update_traces(textposition="inside",hole=.4, hoverinfo="label+percent+name",textinfo='percent+label')

fig.update_layout(
    title_text="Carrier and Manufacture Security Update Distribution",
    # Add annotations in the center of the donut pies.
    annotations=[dict(text='AT&T', x=0.20, y=0.82, font_size=14, showarrow=False),
                 dict(text='Verizon', x=0.81, y=0.19, font_size=14, showarrow=False),
                dict(text='TMobile', x=0.19, y=0.18, font_size=14, showarrow=False),
                dict(text='Sprint', x=0.80, y=0.82, font_size=14, showarrow=False)],
    showlegend=False)
fig.show()    

_Update_Frequency的代码如下:

import pandas as pd
import numpy as np
import researchpy # pip install researchpy
import statistics

import plotly as py
import plotly.graph_objs as go
py.offline.init_notebook_mode(connected=True)
import plotly.io as pio
pio.orca.config.executable = '/Users/kjo/anaconda3/bin/orca'
%config InlineBackend.figure_format = 'svg'


carrierDF=pd.read_csv('../data/allCarrierData_final.csv')
carrierDF.Manufacture=carrierDF.Manufacture.str.replace("Samgung","Samsung")
carrierDF.difference=carrierDF.difference.astype('float')


carrierDF=carrierDF[carrierDF.Carrier!="Verizon"] # remove verizon


#save the figures?
saveFigs=True

print(carrierDF.head())
print(carrierDF[(carrierDF.Manufacture=="Google")])
lvlColor={}
carrierColors={'AT&T':[64, 190, 245],'Sprint':[255,162,0],'T-Mobile':[255,0,238],'Verizon':[255,21,0]}
for index,carrier in enumerate(carrierDF.Carrier.unique()):
    if not carrier in lvlColor:
        #randx=random.randint(0,255)
        #randy=random.randint(0,255)
        #randz=random.randint(0,255)
        #lvlColor[carrier]=(randx, randy, randz)
        lvlColor[carrier]=(carrierColors[carrier][0], carrierColors[carrier][1], carrierColors[carrier][2])
        uniqueCarriers=carrierDF.Carrier.unique()
uniqueManufactures=carrierDF.Manufacture.unique()

lvlColor={}
carrierColors={'AT&T':[64, 190, 245],'Sprint':[255,162,0],'T-Mobile':[255,0,238],'Verizon':[255,21,0]}
print(uniqueCarriers)
for index,carrier in enumerate(uniqueCarriers):
    if not carrier in lvlColor:
        lvlColor[carrier]=(carrierColors[carrier][0], carrierColors[carrier][1], carrierColors[carrier][2])

manufactureColors={'BlackBerry':[38, 38, 38],'HTC':[189, 15, 108],'LG':[245, 17, 108],'Motorola':[1, 11, 120],
                   'Samsung':[10, 63, 148],'Google':[137, 214, 60],'Huawei':[191, 11, 11],
                   'Kyocera':[245, 118, 118],'ZTE':[102, 255, 247],'Essential':[20, 10, 46],
                   'ASUS':[12, 55, 59],'OnePlus':[227, 34, 73],'Alcatel':[0, 188, 217],
                   'RED':[219, 0, 0],'Razer':[219, 0, 0],'CAT':[255, 179, 0],'Coolpad':[237, 150, 0],
                   'Slate':[50, 0, 61],"Sonim":[219, 0, 22],"Sony":[23, 23, 23],'Revvl(T-Mobile)':[255,0,238],
                   "Nokia":[14, 0, 171],'Orbic':[139, 214, 0],"Palm":[255,21,0]}
print(uniqueManufactures)
# colors to work with: https://www.google.com/search?q=rgb+color+picker&oq=rgb+colo&aqs=chrome.0.69i59j0j69i57j0l3.1905j1j7&sourceid=chrome&ie=UTF-8
for index,manufacture in enumerate(uniqueManufactures):
    if not manufacture in lvlColor:
        lvlColor[manufacture]=(manufactureColors[manufacture][0], manufactureColors[manufacture][1], manufactureColors[manufacture][2])
  # Number of bulletins per year for a phone

# Calculate the dates of the first year which consists of the NEXT month from the release into the next year.
nextMonth=carrierDF.Month_Phone_Release.copy()
nextMonth=nextMonth.add(1)
nextYear=carrierDF.Year_Phone_Release.copy()
nextYear=nextYear.add(1)
for index, month in enumerate(nextMonth):
        if month > 12: # meaning a new year!
            nextMonth.iloc[index]=1
            nextYear.iloc[index]+=1


# #print(carrierDF.head())

######################################################################### 
################## SET HOW MANY MONTHS OF UPDATES HERE ################## 
######################################################################### 
#howManyMonths=12 # # For year 1 through 4...
howManyMonths=6 # For every 6 months
#howManyMonths=3 # For every 3 months after release
######################################################################### 
if howManyMonths==12:
    latencyPointsCar_total={}
    for car in carrierDF.Carrier.unique():
        latencyPointsCar_total[car]={}
        for yr in range(4):
            latencyPointsCar_total[car][yr]=0
    #For each month that is greater than 12, set to 1 and add 1 to the year
    carrierDF['First_Year_Start']=carrierDF.Year_Phone_Release.map(str)+"-"+nextMonth.map(str)+"-01"
    carrierDF['First_Year_Start']=pd.to_datetime(carrierDF['First_Year_Start'])
    carrierDF['First_Year_End']=nextYear.map(str)+"-"+nextMonth.map(str)+"-01"
    carrierDF['First_Year_End']=pd.to_datetime(carrierDF['First_Year_End'])

    carrierDF['Second_Year_Start']=nextYear.map(str)+"-"+nextMonth.map(str)+"-01"
    carrierDF['Second_Year_Start']=pd.to_datetime(carrierDF['Second_Year_Start'])
    nextYear=nextYear.copy().add(1)
    carrierDF['Second_Year_End']=nextYear.map(str)+"-"+nextMonth.map(str)+"-01"
    carrierDF['Second_Year_End']=pd.to_datetime(carrierDF['Second_Year_End'])

    carrierDF['Third_Year_Start']=nextYear.map(str)+"-"+nextMonth.map(str)+"-01"
    carrierDF['Third_Year_Start']=pd.to_datetime(carrierDF['Third_Year_Start'])
    nextYear=nextYear.copy().add(1)
    carrierDF['Third_Year_End']=nextYear.map(str)+"-"+nextMonth.map(str)+"-01"
    carrierDF['Third_Year_End']=pd.to_datetime(carrierDF['Third_Year_End'])

    carrierDF['Fourth_Year_Start']=nextYear.map(str)+"-"+nextMonth.map(str)+"-01"
    carrierDF['Fourth_Year_Start']=pd.to_datetime(carrierDF['Fourth_Year_Start'])
    nextYear=nextYear.copy().add(1)
    carrierDF['Fourth_Year_End']=nextYear.map(str)+"-"+nextMonth.map(str)+"-01"
    carrierDF['Fourth_Year_End']=pd.to_datetime(carrierDF['Fourth_Year_End'])

    importantCols=[['First_Year_Start','First_Year_End'],['Second_Year_Start','Second_Year_End'],['Third_Year_Start','Third_Year_End'],['Fourth_Year_Start','Fourth_Year_End']]
    counts=[[0,0],[0,0],[0,0],[0,0]] # Each list represents a year with the number of phones followed by overall updates
    indvCounts=[[],[],[],[]]
    latencyPoints=[[],[],[],[]]
    noUpdates=[0,0,0,0]
elif howManyMonths==6:
    latencyPointsCar_total={}
    for car in carrierDF.Carrier.unique():
        latencyPointsCar_total[car]={}
        for hfyear in range(8):
            latencyPointsCar_total[car][hfyear]=0
            
    carrierDF['Mobile_Release_Date']=carrierDF.Year_Phone_Release.map(str)+"-"+carrierDF.Month_Phone_Release.map(str)+"-01"
    carrierDF['Mobile_Release_Date']=pd.to_datetime(carrierDF['Mobile_Release_Date'])
    carrierDF['Mobile_Release_Date']=carrierDF['Mobile_Release_Date']+ pd.offsets.MonthOffset(1)
    
    importantCols=[['First_1Year_Start','First_1Year_End'],['Second_1Year_Start','Second_1Year_End'],['First_2Year_Start','First_2Year_End'],['Second_2Year_Start','Second_2Year_End'],['First_3Year_Start','First_3Year_End'],['Second_3Year_Start','Second_3Year_End'],['First_4Year_Start','First_4Year_End'],['Second_4Year_Start','Second_4Year_End']]
    
    carrierDF['First_1Year_Start']=carrierDF.Year_Phone_Release.map(str)+"-"+nextMonth.map(str)+"-01"
    carrierDF['First_1Year_Start']=pd.to_datetime(carrierDF['First_1Year_Start'])
    carrierDF['First_1Year_End']=carrierDF['First_1Year_Start'] + pd.offsets.MonthOffset(6)

    carrierDF['Second_1Year_Start']=carrierDF['First_1Year_End'].copy()
    carrierDF['Second_1Year_End']=carrierDF['Second_1Year_Start'] + pd.offsets.MonthOffset(6)

    carrierDF['First_2Year_Start']=carrierDF['Second_1Year_End'].copy()
    carrierDF['First_2Year_End']=carrierDF['First_2Year_Start'] + pd.offsets.MonthOffset(6)

    carrierDF['Second_2Year_Start']=carrierDF['First_2Year_End'].copy()
    carrierDF['Second_2Year_End']=carrierDF['Second_2Year_Start'] + pd.offsets.MonthOffset(6)
            
    carrierDF['First_3Year_Start']=carrierDF['Second_2Year_End'].copy()
    carrierDF['First_3Year_End']=carrierDF['First_3Year_Start'] + pd.offsets.MonthOffset(6)

    carrierDF['Second_3Year_Start']=carrierDF['First_3Year_End'].copy()
    carrierDF['Second_3Year_End']=carrierDF['Second_3Year_Start'] + pd.offsets.MonthOffset(6)

    carrierDF['First_4Year_Start']=carrierDF['Second_3Year_End'].copy()
    carrierDF['First_4Year_End']=carrierDF['First_4Year_Start'] + pd.offsets.MonthOffset(6)

    carrierDF['Second_4Year_Start']=carrierDF['First_4Year_End'].copy()
    carrierDF['Second_4Year_End']=carrierDF['Second_4Year_Start'] + pd.offsets.MonthOffset(6)
    
    counts=[[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0]] # Each list represents a year with the number of phones followed by overall updates
    indvCounts=[[],[],[],[],[],[],[],[]]
    indvCounts_all=[[],[],[],[],[],[],[],[]]
    latencyPoints=[[],[],[],[],[],[],[],[]]
    noUpdates=[0,0,0,0,0,0,0,0]
    
elif howManyMonths==3:
    latencyPointsCar_total={}
    for car in carrierDF.Carrier.unique():
        latencyPointsCar_total[car]={}
        for hfyear in range(int((12/howManyMonths)*4)):
            latencyPointsCar_total[car][hfyear]=0
            
    carrierDF['Mobile_Release_Date']=carrierDF.Year_Phone_Release.map(str)+"-"+carrierDF.Month_Phone_Release.map(str)+"-01"
    carrierDF['Mobile_Release_Date']=pd.to_datetime(carrierDF['Mobile_Release_Date'])
    carrierDF['Mobile_Release_Date']=carrierDF['Mobile_Release_Date']+ pd.offsets.MonthOffset(1)
    
    importantCols=[['1_1Year_Start','1_1Year_End'],['2_1Year_Start','2_1Year_End'],['3_1Year_Start','3_1Year_End'],['4_1Year_Start','4_1Year_End'],
                   ['1_2Year_Start','1_2Year_End'],['2_2Year_Start','2_2Year_End'],['3_2Year_Start','3_2Year_End'],['4_2Year_Start','4_2Year_End'],
                   ['1_3Year_Start','1_3Year_End'],['2_3Year_Start','2_3Year_End'],['3_3Year_Start','3_3Year_End'],['4_3Year_Start','4_3Year_End'],
                   ['1_4Year_Start','1_4Year_End'],['2_4Year_Start','2_4Year_End'],['3_4Year_Start','3_4Year_End'],['4_4Year_Start','4_4Year_End']
                  ]
    
    for index,value in enumerate(importantCols):
        if index==0:
            carrierDF[str(value[0])]=carrierDF.Year_Phone_Release.map(str)+"-"+nextMonth.map(str)+"-01"
            carrierDF[str(value[0])]=pd.to_datetime(carrierDF[str(value[0])])
            carrierDF[str(value[1])]=carrierDF[str(value[0])] + pd.offsets.MonthOffset(3)
        else:
            carrierDF[str(value[0])]=carrierDF[importantCols[index-1][1]].copy()
            carrierDF[str(value[1])]=carrierDF[str(value[0])] + pd.offsets.MonthOffset(3)
                    
    counts=[[0,0] for _ in range(int(12/howManyMonths)*4)] # Each list represents a year with the number of phones followed by overall updates
    indvCounts=[[] for _ in range(int(12/howManyMonths)*4)]
    latencyPoints=[[] for _ in range(int(12/howManyMonths)*4)]
    noUpdates=[0 for _ in range(int(12/howManyMonths)*4)]
    
carrierDF.Bulletin_Level=pd.to_datetime(carrierDF.Bulletin_Level)
carrierDF.Release_Date=pd.to_datetime(carrierDF.Release_Date)
THE_START_OF_TIMELINE=np.argmin(carrierDF.Bulletin_Level.values)
THE_START_OF_TIMELINE=carrierDF.iloc[THE_START_OF_TIMELINE].Bulletin_Level

THE_END_OF_TIMELINE=np.argmax(carrierDF.Bulletin_Level.values)
THE_END_OF_TIMELINE=carrierDF.iloc[THE_END_OF_TIMELINE].Bulletin_Level
# print(THE_START_OF_TIMELINE)
# print(THE_END_OF_TIMELINE)

allModels=[]
allModelsperDuration={}
for yr in range(int(12/howManyMonths)*4):
    allModelsperDuration[yr]=list()

uniqueCars=carrierDF.Carrier.unique()
count=0
for model in carrierDF.Model.unique():
    tempDFs=carrierDF[carrierDF.Model==model]
    for index, year in enumerate(importantCols): # traverse through section
        yrBEG=tempDFs[year[0]].iloc[0]#.unique()[0]
        yrEND=tempDFs[year[1]].iloc[0]#.unique()[0]

        #print(str(yrBEG)+" ; "+str(yrEND))
        if yrBEG >= THE_START_OF_TIMELINE and yrEND < THE_END_OF_TIMELINE: #the particular year is fully within our timeline :)
            counts[index][1]+=1 # add a mobile device to count even if it wasn't updated within this time frame
            
            # IF WE WERE TO CHANGE HOW WE MEASURE, this would be on bulletins NOT the release dates
            #tempDF=tempDFs[(tempDFs.Release_Date >= yrBEG) & (tempDFs.Release_Date < yrEND)]# filtered release dates accordingly
            
            tempDF=tempDFs[(tempDFs.Bulletin_Level >= yrBEG) & (tempDFs.Bulletin_Level < yrEND)]
            
            actualUpdates=0
            if len(tempDF)>0: # meaning an update was reported but if DF==0, it didn't fall within that year
                for bul in tempDF.Bulletin_Level.unique(): # for all updated bulletins
                    #if pd.to_datetime(bul) >= yrBEG and pd.to_datetime(bul) < yrEND: #the bulletin was released within this timeframe
                    if True: # for the above
                        tmp=tempDF[tempDF.Bulletin_Level==bul]
                        if len(tmp)>1:
                            if len(tmp.Carrier.unique())>1:
                                for car in tmp.Carrier.unique(): 
                                    latencyPointsCar_total[car][index]=latencyPointsCar_total[car][index]+1
                            avgs=tmp.difference.mean()
                            latencyPoints[index].append(avgs)
                        else:
                            latencyPoints[index].extend(tmp.difference.values)
                            for car in tmp.Carrier.unique():
                                latencyPointsCar_total[car][index]=latencyPointsCar_total[car][index]+1
                        actualUpdates+=1 # for each update
                if actualUpdates>6:
                    print(str(yrBEG)+" : "+str(yrEND))
                    print(tempDF)
                    print()
                counts[index][0]+=actualUpdates
                indvCounts[index].append(actualUpdates)
                if actualUpdates == 0:
                    noUpdates[index]+=1
                else:
                    allModels.append(model)
                    allModelsperDuration[index].append(model)
            else:
                noUpdates[index]+=1
            indvCounts_all[index].append(actualUpdates)
          if not "Verizon" in carrierDF.Carrier.values: # only output if no Verizon
    if howManyMonths == 12:
        # Calculate general stats
        print("Number of updates, Number of phones")
        print(counts)
        print("Phones Updating:")
        updatedPh=[]
        for i in range(len(noUpdates)):
            updatedPh.append(counts[i][1]-value[i])
        print(updatedPh)
        print("No Updates:")
        print(noUpdates)
        print("Number of Updates Across each Year: ")
        print(indvCounts)

        for car in carrierDF.Carrier.unique():
            print(latencyPointsCar_total[car])
        print()
        #print("Updates with Difference Across each Year: ")
        #print(latencyPoints)
        print("----------------------------")
        totals=0
        for index, values in enumerate(counts):
            avg=values[0]/values[1]
            print("Avg bulletins at year " + str(index+1) + " " + str(avg))
            totals+=avg
        print("Average bulletins per year: " + str(totals/4))
    elif howManyMonths == 6:
        # Calculate general stats
        print("Number of updates, Number of phones")
        print(counts)
        print("Phones Updating:")
        updatedPh=[]
        for i in range(len(noUpdates)):
            updatedPh.append(counts[i][1]-noUpdates[i])
        print(updatedPh)
        print("No Updates:")
        print(noUpdates)
        print("----------------------------")
        totals=0
        for index, values in enumerate(counts):
            avg=values[0]/values[1]
            print("Avg bulletins at half year " + str(int((int(index+1)/2)-.3)) + " " + str(avg))
            totals+=avg
        print("Average bulletins per 6 months: " + str(totals/int((12/howManyMonths)*4)))

    elif howManyMonths == 3:
        # Calculate general stats
        print("Number of updates, Number of phones")
        print(counts)

        print("Phones Updating:")
        updatedPh=[]
        for i,value in range(len(noUpdates)):
            updatedPh.append(counts[i][1]-value)
        print(updatedPh)
        print("Phones Not Updating:")
        print(noUpdates)

        for car in carrierDF.Carrier.unique():
            print(latencyPointsCar_total[car])
        print()
        #print("Updates with Difference Across each Year: ")
        #print(latencyPoints)
        print("----------------------------")
        totals=0
        year=0
        for index, values in enumerate(counts):
            avg=values[0]/values[1]
            print("Avg bulletins at "+str(int(index%(12/howManyMonths))+1)+" partial year " + str(year+1) + " : " + str(avg))
            totals+=avg
            if int(index%(12/howManyMonths)+1)==4:
                year+=1
        print("Average bulletins per 3 months: " + str(totals/int((12/howManyMonths)*4)))


    print("Number of Unique Models Across Frequency: ")
    print(len(set(allModels)))

if howManyMonths == 12:
    latencyPointsCar={}
    for car in carrierDF.Carrier.unique():
        latencyPointsCar[car]={}
        for yr in range(4):
            latencyPointsCar[car][yr]=list()

    freqPointsCar={}
    for car in carrierDF.Carrier.unique():
        freqPointsCar[car]={}
        for yr in range(4):
            freqPointsCar[car][yr]=list()
elif howManyMonths==6 or howManyMonths==3:
    latencyPointsCar={}
    for car in carrierDF.Carrier.unique():
        latencyPointsCar[car]={}
        for yr in range(int(12/howManyMonths)*4):
            latencyPointsCar[car][yr]=list()

    freqPointsCar={}
    freqPointsCarAllDevices={}
    for car in carrierDF.Carrier.unique():
        freqPointsCar[car]={}
        freqPointsCarAllDevices[car]={}
        for yr in range(int(12/howManyMonths)*4):
            freqPointsCar[car][yr]=list()
            freqPointsCarAllDevices[car][yr]=list()
        
count=0
# for each mobile in row (2017-07-01 TO 2019-06-01 (if year fits within this time line, we calculate accordingly))
for car in carrierDF.Carrier.unique():
    tmpCar=carrierDF[carrierDF.Carrier==car]
    for model in tmpCar.Model.unique():
        tempDFs=tmpCar[tmpCar.Model==model]
        for index, year in enumerate(importantCols): # traverse through each year
            yrBEG=tempDFs[year[0]].iloc[0]#.unique()[0]
            yrEND=tempDFs[year[1]].iloc[0]#.unique()[0]

            if yrBEG >= THE_START_OF_TIMELINE and yrEND < THE_END_OF_TIMELINE: #the particular year is fully within our timeline :)
                # filtered release dates accordingly
                #tempDF=tempDFs[(tempDFs.Release_Date >= yrBEG) & (tempDFs.Release_Date < yrEND)]
                tempDF=tempDFs[(tempDFs.Bulletin_Level >= yrBEG) & (tempDFs.Bulletin_Level < yrEND)]
                
                actualUpdates=0
                if len(tempDF)>0: # meaning an update was reported but if DF==0, it didn't fall within that year
                    for bul in tempDF.Bulletin_Level.unique():
                        #if pd.to_datetime(bul) >= yrBEG and pd.to_datetime(bul) < yrEND:
                        if True: # comment this out if we filter OG by release date and not bulletin
                            tmp=tempDF[tempDF.Bulletin_Level==bul]
                            if len(tmp)>1:
                                print("how")
                            if len(tmp)>0:
                                actualUpdates+=1 # for each update  
                                latencyPointsCar[car][index].extend(tmp.difference.values)
                    if actualUpdates>0:
                        freqPointsCar[car][index].append(actualUpdates)
                freqPointsCarAllDevices[car][index].append(actualUpdates) # this includes devices within the timeframe but maybe no updates
               # Plot latency points per update within year 1, 2, 3 etc...
data=[]
medians=[]
# for each severity, calculate the counts on each bulletin
for yr in range(int(12/howManyMonths)*4):
    print(yr)
    print(statistics.mean(latencyPoints[yr]))
    medians.append(statistics.median(latencyPoints[yr]))
    print()
    formatGood="th"
    if yr==0:
        formatGood="st"
    elif yr==1:
        formatGood="nd"
    elif yr==2:
        formatGood="rd"

    data.append(go.Box(
            y=latencyPoints[yr],
            name=str(yr+1) +formatGood, #+" period"
            boxpoints = 'all',
            showlegend=False,
            marker=dict(color="rgb(31, 119, 180)")
        ))


# Edit the layout
layout = dict(#title = "Average Latency per 6 Month Periods after Release",
              yaxis = dict(title = 'Days',showgrid=True, gridcolor='rgb(219, 219, 219)'),
              #xaxis = dict(title = '6 Month Period'),
              plot_bgcolor='rgba(0,0,0,0)',
              font=dict(size=20))

fig = dict(data=data, layout=layout)
py.offline.iplot(fig, filename='avg-latency-since-release-years.png')
# print("REMEMBER TO ADD VERIZON")
# print("REMEMBER TO ADD VERIZON")
# print("REMEMBER TO ADD VERIZON")

if "Verizon" in carrierDF.Carrier.values and saveFigs==True: # only save if no Verizon
    pio.write_image(fig, 'avg-latency-since-release-years.pdf', width=1000, height=600)
    
    
############################################################################################################################
############################################################################################################################
############################################################################################################################

# LINE CHART INSTEAD
data=[]

quarters=[]
for y in range(len(medians)):
    x=y+1
    if x==1: quarters.append("1st")
    elif x==2: quarters.append("2nd")
    elif x==3: quarters.append("3rd")
    else: quarters.append(str(x) + "th")


data.append(go.Scatter(
            x=quarters,
            y=medians,
        ))


    
# Edit the layout
layout = dict(#title = "Number of Updates per 6 Month Period after Release per Carrier",
              yaxis = dict(title = 'Median Latency of Updates',showgrid=True, gridcolor='rgb(219, 219, 219)',range=[0,25]),
              #xaxis = dict(title = '6 Month Period'),
              plot_bgcolor='rgba(0,0,0,0)',
              boxmode='group',
              violinmode='group',
              legend_orientation="h",
              legend=dict(x=0.25, y=1.1),
              font=dict(size=20)
              )

fig = dict(data=data, layout=layout)
py.offline.iplot(fig, filename='release-years-frequency-per-carrier-median.png')


if not "Verizon" in carrierDF.Carrier.values and saveFigs==True: # only save if no Verizon
    pio.write_image(fig, 'carrier-number-updates-released-per-year.pdf', width=1000, height=600)

data=[]
# for each severity, calculate the counts on each bulletin
medians=[]


for yr in range(int(12/howManyMonths)*4): # create the line chart
    formatGood="th"
    if yr==0:
        formatGood="st"
    elif yr==1:
        formatGood="nd"
    elif yr==2:
        formatGood="rd"

    medians.append(statistics.median(indvCounts_all[yr]))
    data.append(go.Box(
        y=indvCounts_all[yr],
        name=str(yr+1) +formatGood, #+" period"
        boxpoints = 'all',
        showlegend=False,
        marker=dict(color="rgb(31, 119, 180)")
    ))

# Edit the layout
layout = dict(#title = "Number of Updates after Release",
              yaxis = dict(title = 'Number of Updates',showgrid=True, gridcolor='rgb(219, 219, 219)'),
              #xaxis = dict(title = '6 Month Period '),
              plot_bgcolor='rgba(0,0,0,0)',
              font=dict(size=20)
              )

fig = dict(data=data, layout=layout)
py.offline.iplot(fig, filename='frequency-since-release-years.png')


if not "Verizon" in carrierDF.Carrier.values and saveFigs==True: # only save if no Verizon
    pio.write_image(fig, 'frequency-since-release-years.pdf', width=1000, height=600)

    
############################################################################################################################
############################################################################################################################
############################################################################################################################

# LINE CHART INSTEAD
data=[]
quarters=[]
for y in range(len(medians)):
    x=y+1
    if x==1: quarters.append("1st")
    elif x==2: quarters.append("2nd")
    elif x==3: quarters.append("3rd")
    else: quarters.append(str(x) + "th")


data.append(go.Scatter(
            x=quarters,
            y=medians,
            marker=dict(color="rgb(31, 119, 180)")
        ))


    
# Edit the layout
layout = dict(#title = "Number of Updates per 6 Month Period after Release per Carrier",
              yaxis = dict(title = 'Median Frequency of Updates',showgrid=True, gridcolor='rgb(219, 219, 219)',range=[0,2]),
              #xaxis = dict(title = '6 Month Period'),
              plot_bgcolor='rgba(0,0,0,0)',
              boxmode='group',
              violinmode='group',
              legend_orientation="h",
              legend=dict(x=0.25, y=1.1),
              font=dict(size=20)
              )

fig = dict(data=data, layout=layout)
py.offline.iplot(fig, filename='release-years-frequency-per-carrier-median.png')


# if not "Verizon" in carrierDF.Carrier.values and saveFigs==True: # only save if no Verizon
#     pio.write_image(fig, 'carrier-number-updates-released-per-year.pdf', width=1000, height=600)
# exception here

data=[]
total_updates={}
total_years={}

total_updates_wZeroes={}
total_years_wZeroes={}


for yr in range(int(12/howManyMonths)*4):

    properList_update=[]
    properList_years=[]
    properList_update_wZeroes=[]
    properList_years_wZeroes=[]
    for car in carrierDF.Carrier.unique():
        properList_update_wZeroes.extend(freqPointsCarAllDevices[car][yr])
        properList_years_wZeroes.extend([car for _ in range(len(freqPointsCarAllDevices[car][yr]))]) #+" period"
    
    total_updates_wZeroes[yr]=properList_update_wZeroes.copy()
    total_years_wZeroes[yr]=properList_years_wZeroes.copy()
    

for yr in range(int(12/howManyMonths)*4):
        formatGood="th"
        if yr==0:
            formatGood="st"
        elif yr==1:
            formatGood="nd"
        elif yr==2:
            formatGood="rd"
            
        data.append(go.Box(
            x=total_years_wZeroes[yr],
            y=total_updates_wZeroes[yr],
            name=str(yr+1)+formatGood,
#             marker=dict(color='rgb'+str(lvlColor[car]))
        ))
    
    
# Edit the layout
layout = dict(#title = "Number of Updates per 6 Month Period after Release per Carrier",
              yaxis = dict(title = 'Number of Updates',showgrid=True, gridcolor='rgb(219, 219, 219)'),
              #xaxis = dict(title = '6 Month Period'),
              plot_bgcolor='rgba(0,0,0,0)',
              boxmode='group',
              violinmode='group',
              legend_orientation="h",
              legend=dict(x=0.25, y=1.1),
              font=dict(size=20)
              )

fig = dict(data=data, layout=layout)
py.offline.iplot(fig, filename='release-years-frequency-per-carrier.png')


if not "Verizon" in carrierDF.Carrier.values and saveFigs==True: # only save if no Verizon
    pio.write_image(fig, 'carrier-number-updates-released-per-year.pdf', width=1000, height=600)

    
############################################################################################################################
############################################################################################################################
############################################################################################################################

# LINE CHART INSTEAD
from statistics import median
data=[]


poperCars_yr=[]
poperCars={}
for car in carrierDF.Carrier.unique():
    poperCars[car]=[]

for yr in range(int(12/howManyMonths)*4):
    poperCars_yr.append(yr)
    for car in carrierDF.Carrier.unique():
        poperCars[car].append(median(freqPointsCarAllDevices[car][yr])) # for each carrier, calculate median for ea section
        
for car in carrierDF.Carrier.unique():
    data.append(go.Scatter(
            x=poperCars_yr,
            y=poperCars[car],
            name=str(car),
            marker=dict(color='rgb'+str(lvlColor[car]))
        ))

# Edit the layout
layout = dict(#title = "Number of Updates per 6 Month Period after Release per Carrier",
              yaxis = dict(title = 'Median Number of Updates',showgrid=True, gridcolor='rgb(219, 219, 219)'),
              #xaxis = dict(title = '6 Month Period'),
              plot_bgcolor='rgba(0,0,0,0)',
              boxmode='group',
              violinmode='group',
              legend_orientation="h",
              legend=dict(x=0.25, y=1.1),
              font=dict(size=20)
              )

fig = dict(data=data, layout=layout)
py.offline.iplot(fig, filename='release-years-frequency-per-carrier-median.png')


if not "Verizon" in carrierDF.Carrier.values and saveFigs==True: # only save if no Verizon
    pio.write_image(fig, 'carrier-number-updates-released-per-year.pdf', width=1000, height=600)
# exception here

data=[]
total_updates={}
total_years={}

total_updates_wZeroes={}
total_years_wZeroes={}


for yr in range(int(12/howManyMonths)*4):

    properList_update=[]
    properList_years=[]
    properList_update_wZeroes=[]
    properList_years_wZeroes=[]
    for car in carrierDF.Carrier.unique():
        properList_update_wZeroes.extend(freqPointsCarAllDevices[car][yr])
        properList_years_wZeroes.extend([car for _ in range(len(freqPointsCarAllDevices[car][yr]))]) #+" period"
    
    total_updates_wZeroes[yr]=properList_update_wZeroes.copy()
    total_years_wZeroes[yr]=properList_years_wZeroes.copy()
    
total_x=[]
total_y=[]
for yr in range(int(12/howManyMonths)*4):
    total_x.extend(total_years_wZeroes[yr])
    total_y.extend(total_updates_wZeroes[yr])
            
data.append(go.Box(
    x=total_x,
    y=total_y,
    marker=dict(color="rgb(31, 119, 180)")
))
    
    
# Edit the layout
layout = dict(#title = "Number of Updates per 6 Month Period after Release per Carrier",
              yaxis = dict(title = 'Updates per 6-Months',showgrid=True, gridcolor='rgb(219, 219, 219)'),
              #xaxis = dict(title = '6 Month Period'),
              plot_bgcolor='rgba(0,0,0,0)',
              boxmode='group',
              violinmode='group',
              legend_orientation="h",
              legend=dict(x=0.25, y=1.1),
              font=dict(size=20)
              )

fig = dict(data=data, layout=layout)
py.offline.iplot(fig, filename='release-years-frequency-per-carrier.png')


if not "Verizon" in carrierDF.Carrier.values and saveFigs==True: # only save if no Verizon
    pio.write_image(fig, 'carrier-number-updates-released-per-year-CARONLY.pdf', width=1000, height=600)
# import statistics

# if howManyMonths == 12:
#     latencyPointsCar={}
#     for car in carrierDF.Carrier.unique():
#         latencyPointsCar[car]={}
#         for yr in range(4):
#             latencyPointsCar[car][yr]=list()

#     freqPointsCar={}
#     for car in carrierDF.Carrier.unique():
#         freqPointsCar[car]={}
#         for yr in range(4):
#             freqPointsCar[car][yr]=list()


if howManyMonths==6 or howManyMonths==3:
    latencyPointsMan={}
    freqPointsManAllDevices={}
    for man in carrierDF.Manufacture.unique():
        latencyPointsMan[man]={}
        freqPointsManAllDevices[man]={}
        for yr in range(int(12/howManyMonths)*4):
            latencyPointsMan[man][yr]=list()
            freqPointsManAllDevices[man][yr]=list()

    freqPointsMan={}
    for man in carrierDF.Manufacture.unique():
        freqPointsMan[man]={}
        for yr in range(int(12/howManyMonths)*4):
            freqPointsMan[man][yr]=list()
        
count=0
# for each mobile in row (2017-07-01 TO 2019-06-01 (if year fits within this time line, we calculate accordingly))
for man in carrierDF.Manufacture.unique():
    tmpMan=carrierDF[carrierDF.Manufacture==man]
    for model in tmpMan.Model.unique():
        tempDFs=tmpMan[tmpMan.Model==model]
        for index, year in enumerate(importantCols): # traverse through each year
            yrBEG=tempDFs[year[0]].iloc[0]#.unique()[0]
            yrEND=tempDFs[year[1]].iloc[0]#.unique()[0]

            if yrBEG >= THE_START_OF_TIMELINE and yrEND < THE_END_OF_TIMELINE: #the particular year/session is fully within our timeline :)
                # filtered release dates accordingly
                #tempDF=tempDFs[(tempDFs.Release_Date >= yrBEG) & (tempDFs.Release_Date < yrEND)]
                tempDF=tempDFs[(tempDFs.Bulletin_Level >= yrBEG) & (tempDFs.Bulletin_Level < yrEND)]
                
                actualUpdates=0
                if len(tempDF)>0: # meaning an update was reported but if DF==0, it didn't fall within that year
                    for bul in tempDF.Bulletin_Level.unique():
#                         if pd.to_datetime(bul) >= yrBEG and pd.to_datetime(bul) < yrEND:
                        if True:
                            tmp=tempDF[tempDF.Bulletin_Level==bul]
                            if len(tmp)>1: #calculate the average
                                actualUpdates+=1 # for each update  
                                latencyPointsMan[man][index].append(statistics.mean(tmp.difference.values))
                            elif len(tmp)==1:
                                actualUpdates+=1 # for each update  
                                latencyPointsMan[man][index].extend(tmp.difference.values)
                        
                    if actualUpdates>0:
                        freqPointsMan[man][index].append(actualUpdates)
                freqPointsManAllDevices[man][index].append(actualUpdates) # this includes devices within the timeframe but maybe no updates
           data=[]
total_updates={}
total_years={}
total_updates_wZeroes={}
total_man_wZeroes={}


normedMan=[]
for man in carrierDF.Manufacture.unique():
    temp=carrierDF[carrierDF.Manufacture==man]
    if len(temp.Model.unique())>2: # only include manfuactures with 3 or more phone models
        normedMan.append(man)
importantMan=list(set(normedMan)) # get a list of important manufacturers
    
# sort important manufactures based on number of updates
manLengths=[]
for man in importantMan:
    manLengths.append(len(carrierDF[carrierDF.Manufacture==man]))
importantMan = [x for _,x in sorted(zip(manLengths,importantMan),reverse=True)]
   

for yr in range(int(12/howManyMonths)*4):# for each year range,
    properList_update_wZeroes=[]
    properList_manu_wZeroes=[]
    for man in importantMan:# for each manufacturer
        properList_update_wZeroes.extend(freqPointsManAllDevices[man][yr])
        properList_manu_wZeroes.extend([man for _ in range(len(freqPointsManAllDevices[man][yr]))]) # +" period"   
    total_updates_wZeroes[yr]=properList_update_wZeroes.copy()
    total_man_wZeroes[yr]=properList_manu_wZeroes.copy()
    
for yr in range(int(12/howManyMonths)*4): # per each year
    formatGood="th"
    if yr==0:
        formatGood="st"
    elif yr==1:
        formatGood="nd"
    elif yr==2:
        formatGood="rd"
        
    data.append(go.Box(
            x=total_man_wZeroes[yr], # needs to be each manufacturer
            y=total_updates_wZeroes[yr],
            name=str(yr+1)+formatGood,
            #marker=dict(color='rgb'+str(lvlColor[man]))
        ))
    
# Edit the layout
colorway=['rgb(60, 180, 75)', 'rgb(230, 25, 75)', 'rgb(0, 130, 200)', 'rgb(245, 130, 48)', 'rgb(145, 30, 180)', 'rgb(240, 50, 230)', 'rgb(0, 128, 128)', 'rgb(170, 110, 40)', 'rgb(0, 130, 200)', 'rgb(128, 128, 128)','rgb(128, 0, 0)', 'rgb(170, 255, 195)', 'rgb(128, 128, 0)', 'rgb(255, 215, 180)', 'rgb(0, 0, 128)', 'rgb(128, 128, 128)', 'rgb(255, 255, 255)','rgb(230, 25, 75)',  'rgb(0, 0, 0)','rgb(255, 225, 25)',  'rgb(255, 250, 200)', 'rgb(250, 190, 190)']
layout = dict(#title = "Number of True Updates per 6 Month Period after Release per Manufacture ",
              yaxis = dict(title = 'Number of Updates',showgrid=True, gridcolor='rgb(219, 219, 219)'),
              #xaxis = dict(title = '6 Month Period',showgrid=True,zeroline=True,showline=True,zerolinecolor='#969696'),
              plot_bgcolor='rgba(0,0,0,0)',
              boxmode='group',
              colorway=colorway,
              legend_orientation="h",
              legend=dict(x=0.24, y=1.15), # for pt .33
              font=dict(size=20)
              )

fig = dict(data=data, layout=layout)
py.offline.iplot(fig, filename='release-years-frequency-per-carrier.png')


if not "Verizon" in carrierDF.Carrier.values and saveFigs==True: # only save if no Verizon
    pio.write_image(fig, 'manufacture-freq.pdf', width=1100, height=600)
  data=[]
total_updates={}
total_years={}
total_updates_wZeroes={}
total_man_wZeroes={}


print(len(carrierDF.Manufacture.unique()))
normedMan=[]
for man in carrierDF.Manufacture.unique():
    normedMan.append(man)
importantMan=list(set(normedMan)) # get a list of important manufacturers
    
# sort important manufactures based on number of updates
manLengths=[]
for man in importantMan:
    manLengths.append(len(carrierDF[carrierDF.Manufacture==man]))
importantMan = [x for _,x in sorted(zip(manLengths,importantMan),reverse=True)]
   

for yr in range(int(12/howManyMonths)*4):# for each year range,
    properList_update_wZeroes=[]
    properList_manu_wZeroes=[]
    for man in importantMan:# for each manufacturer
        properList_update_wZeroes.extend(freqPointsManAllDevices[man][yr])
        properList_manu_wZeroes.extend([man for _ in range(len(freqPointsManAllDevices[man][yr]))]) # +" period"   
    total_updates_wZeroes[yr]=properList_update_wZeroes.copy()
    total_man_wZeroes[yr]=properList_manu_wZeroes.copy()
    
for yr in range(int(12/howManyMonths)*4): # per each year
    formatGood="th"
    if yr==0:
        formatGood="st"
    elif yr==1:
        formatGood="nd"
    elif yr==2:
        formatGood="rd"
        
    data.append(go.Box(
            x=total_man_wZeroes[yr], # needs to be each manufacturer
            y=total_updates_wZeroes[yr],
            name=str(yr+1)+formatGood,
            #marker=dict(color='rgb'+str(lvlColor[man]))
        ))
    
# Edit the layout
colorway=['rgb(60, 180, 75)', 'rgb(230, 25, 75)', 'rgb(0, 130, 200)', 'rgb(245, 130, 48)', 'rgb(145, 30, 180)', 'rgb(240, 50, 230)', 'rgb(0, 128, 128)', 'rgb(170, 110, 40)', 'rgb(0, 130, 200)', 'rgb(128, 128, 128)','rgb(128, 0, 0)', 'rgb(170, 255, 195)', 'rgb(128, 128, 0)', 'rgb(255, 215, 180)', 'rgb(0, 0, 128)', 'rgb(128, 128, 128)', 'rgb(255, 255, 255)','rgb(230, 25, 75)',  'rgb(0, 0, 0)','rgb(255, 225, 25)',  'rgb(255, 250, 200)', 'rgb(250, 190, 190)']
layout = dict(#title = "Number of True Updates per 6 Month Period after Release per Manufacture ",
              yaxis = dict(title = 'Number of Updates',showgrid=True, gridcolor='rgb(219, 219, 219)'),
              #xaxis = dict(title = '6 Month Period',showgrid=True,zeroline=True,showline=True,zerolinecolor='#969696'),
              plot_bgcolor='rgba(0,0,0,0)',
              boxmode='group',
              colorway=colorway,
              legend_orientation="h",
              legend=dict(x=0.24, y=1.15), # for pt .33
              font=dict(size=20)
              )

fig = dict(data=data, layout=layout)
py.offline.iplot(fig, filename='release-years-frequency-per-carrier.png')

print("REMEMBER TO TAKE OUT VERIZON")
print("REMEMBER TO TAKE OUT VERIZON")
print("REMEMBER TO TAKE OUT VERIZON")

if not "Verizon" in carrierDF.Carrier.values and saveFigs==True: # only save if no Verizon
    pio.write_image(fig, 'manufacture-freq-all.pdf', width=1100, height=600)  
    # Get max of each..
data=[]
total_updates={}
total_years={}
total_updates_wZeroes={}
total_man_wZeroes={}

#remove manufactures with only two or less phones
print(len(carrierDF.Manufacture.unique()))
normedMan=[]
for man in carrierDF.Manufacture.unique():
    temp=carrierDF[carrierDF.Manufacture==man]
    if len(temp.Model.unique())>2: # only include manfuactures with 3 or more phone models
        normedMan.append(man)
importantMan=list(set(normedMan)) # get a list of important manufacturers
    
# sort important manufactures based on number of updates
manLengths=[]
for man in importantMan:
    manLengths.append(len(carrierDF[carrierDF.Manufacture==man]))
importantMan = [x for _,x in sorted(zip(manLengths,importantMan),reverse=True)]
   

for yr in range(int(12/howManyMonths)*4):# for each year range,
    properList_update_wZeroes=[]
    properList_manu_wZeroes=[]
    for man in importantMan:# for each manufacturer
        #for calculating median
        #properList_update_wZeroes.extend(freqPointsManAllDevices[man][yr])
        #properList_manu_wZeroes.extend([man for _ in range(len(freqPointsManAllDevices[man][yr]))]) # +" period"  
        
        # For the max value
        if len(freqPointsManAllDevices[man][yr])>0:
            properList_update_wZeroes.append(max(freqPointsManAllDevices[man][yr]))
            properList_manu_wZeroes.append(man)
        
    total_updates_wZeroes[yr]=properList_update_wZeroes.copy()
    total_man_wZeroes[yr]=properList_manu_wZeroes.copy()
    
for yr in range(int(12/howManyMonths)*4): # per each year
    formatGood="th"
    if yr==0:
        formatGood="st"
    elif yr==1:
        formatGood="nd"
    elif yr==2:
        formatGood="rd"
        
#     print(yr)
#     print(total_man_wZeroes[yr])
#     print(total_updates_wZeroes[yr])
    data.append(go.Bar(
            x=total_man_wZeroes[yr], # needs to be each manufacturer
            y=total_updates_wZeroes[yr],
            name=str(yr+1)+formatGood,
            #marker=dict(color='rgb'+str(lvlColor[man]))
        ))
    
# Edit the layout
colorway=['rgb(60, 180, 75)', 'rgb(230, 25, 75)', 'rgb(0, 130, 200)', 'rgb(245, 130, 48)', 'rgb(145, 30, 180)', 'rgb(240, 50, 230)', 'rgb(0, 128, 128)', 'rgb(170, 110, 40)', 'rgb(0, 130, 200)', 'rgb(128, 128, 128)','rgb(128, 0, 0)', 'rgb(170, 255, 195)', 'rgb(128, 128, 0)', 'rgb(255, 215, 180)', 'rgb(0, 0, 128)', 'rgb(128, 128, 128)', 'rgb(255, 255, 255)','rgb(230, 25, 75)',  'rgb(0, 0, 0)','rgb(255, 225, 25)',  'rgb(255, 250, 200)', 'rgb(250, 190, 190)']
layout = dict(#title = "Number of True Updates per 6 Month Period after Release per Manufacture ",
              yaxis = dict(title = 'Number of Updates',showgrid=True, gridcolor='rgb(219, 219, 219)'),
              #xaxis = dict(title = '6 Month Period',showgrid=True,zeroline=True,showline=True,zerolinecolor='#969696'),
              plot_bgcolor='rgba(0,0,0,0)',
              boxmode='group',
              colorway=colorway,
              legend_orientation="h",
              legend=dict(x=0.24, y=1.15), # for pt .33
              font=dict(size=20)
              )

fig = dict(data=data, layout=layout)
py.offline.iplot(fig, filename='release-years-frequency-per-carrier.png')

print("REMEMBER TO TAKE OUT VERIZON")
print("REMEMBER TO TAKE OUT VERIZON")
print("REMEMBER TO TAKE OUT VERIZON")

if not "Verizon" in carrierDF.Carrier.values and saveFigs==True: # only save if no Verizon
    pio.write_image(fig, 'manufacture-freq-max.pdf', width=1100, height=600)
# Only the manufacturers shared across all carriers

# Get max of each..
data=[]
total_updates={}
total_years={}
total_updates_wZeroes={}
total_man_wZeroes={}
normedMan=[]

# #remove manufactures with only two or less phones
# print(len(carrierDF.Manufacture.unique()))
# normedMan=[]
# for man in carrierDF.Manufacture.unique():
#     temp=carrierDF[carrierDF.Manufacture==man]
# #     if len(temp.Model.unique())>2: # only include manfuactures with 3 or more phone models
#     normedMan.append(man)
# importantMan=list(set(normedMan)) # get a list of important manufacturers
    
# Only include manufacturers shared across all carriers
for man in carrierDF.Manufacture.unique():
    temp=carrierDF[carrierDF.Manufacture==man]
    if len(temp.Carrier.unique())>2: # meaning the 3 we care about exists
        normedMan.append(man)
importantMan=list(set(normedMan))    
 
    
# sort important manufactures based on number of updates
manLengths=[]
for man in importantMan:
    manLengths.append(len(carrierDF[carrierDF.Manufacture==man]))
importantMan = [x for _,x in sorted(zip(manLengths,importantMan),reverse=True)]
   

for yr in range(int(12/howManyMonths)*4):# for each year range,
    properList_update_wZeroes=[]
    properList_manu_wZeroes=[]
    for man in importantMan:# for each manufacturer
        #for calculating median
        #properList_update_wZeroes.extend(freqPointsManAllDevices[man][yr])
        #properList_manu_wZeroes.extend([man for _ in range(len(freqPointsManAllDevices[man][yr]))]) # +" period"  
        
        # For the max value
        if len(freqPointsManAllDevices[man][yr])>0:
            properList_update_wZeroes.append(max(freqPointsManAllDevices[man][yr]))
            properList_manu_wZeroes.append(man)
        
    total_updates_wZeroes[yr]=properList_update_wZeroes.copy()
    total_man_wZeroes[yr]=properList_manu_wZeroes.copy()
    
for yr in range(int(12/howManyMonths)*4): # per each year
    formatGood="th"
    if yr==0:
        formatGood="st"
    elif yr==1:
        formatGood="nd"
    elif yr==2:
        formatGood="rd"
        
#     print(yr)
#     print(total_man_wZeroes[yr])
#     print(total_updates_wZeroes[yr])
    data.append(go.Bar(
            x=total_man_wZeroes[yr], # needs to be each manufacturer
            y=total_updates_wZeroes[yr],
            name=str(yr+1)+formatGood,
            #marker=dict(color='rgb'+str(lvlColor[man]))
        ))
    
# Edit the layout
colorway=['rgb(60, 180, 75)', 'rgb(230, 25, 75)', 'rgb(0, 130, 200)', 'rgb(245, 130, 48)', 'rgb(145, 30, 180)', 'rgb(240, 50, 230)', 'rgb(0, 128, 128)', 'rgb(170, 110, 40)', 'rgb(0, 130, 200)', 'rgb(128, 128, 128)','rgb(128, 0, 0)', 'rgb(170, 255, 195)', 'rgb(128, 128, 0)', 'rgb(255, 215, 180)', 'rgb(0, 0, 128)', 'rgb(128, 128, 128)', 'rgb(255, 255, 255)','rgb(230, 25, 75)',  'rgb(0, 0, 0)','rgb(255, 225, 25)',  'rgb(255, 250, 200)', 'rgb(250, 190, 190)']
layout = dict(#title = "Number of True Updates per 6 Month Period after Release per Manufacture ",
              yaxis = dict(title = 'Max Number of Updates',showgrid=True, gridcolor='rgb(219, 219, 219)'),
              #xaxis = dict(title = '6 Month Period',showgrid=True,zeroline=True,showline=True,zerolinecolor='#969696'),
              plot_bgcolor='rgba(0,0,0,0)',
              boxmode='group',
              colorway=colorway,
              legend_orientation="h",
              legend=dict(x=0.24, y=1.15), # for pt .33
              font=dict(size=20)
              )

fig = dict(data=data, layout=layout)
py.offline.iplot(fig, filename='release-years-frequency-per-carrier.png')


if not "Verizon" in carrierDF.Carrier.values and saveFigs==True: # only save if no Verizon
    pio.write_image(fig, 'manufacture-freq-shared-max.pdf', width=1100, height=600)
# Only the manufacturers shared across all carriers

# Get max of each..
data=[]
total_updates={}
total_years={}
total_updates_wZeroes={}
total_man_wZeroes={}
normedMan=[]

# #remove manufactures with only two or less phones
# print(len(carrierDF.Manufacture.unique()))
# normedMan=[]
# for man in carrierDF.Manufacture.unique():
#     temp=carrierDF[carrierDF.Manufacture==man]
# #     if len(temp.Model.unique())>2: # only include manfuactures with 3 or more phone models
#     normedMan.append(man)
# importantMan=list(set(normedMan)) # get a list of important manufacturers
    
# Only include manufacturers shared across all carriers
for man in carrierDF.Manufacture.unique():
    temp=carrierDF[carrierDF.Manufacture==man]
    if len(temp.Carrier.unique())>2: # meaning the 3 we care about exists
        normedMan.append(man)
importantMan=list(set(normedMan))    
 
    
# sort important manufactures based on number of updates
manLengths=[]
for man in importantMan:
    manLengths.append(len(carrierDF[carrierDF.Manufacture==man]))
importantMan = [x for _,x in sorted(zip(manLengths,importantMan),reverse=True)]
   

for yr in range(int(12/howManyMonths)*4):# for each year range,
    properList_update_wZeroes=[]
    properList_manu_wZeroes=[]
    for man in importantMan:# for each manufacturer
        #for calculating median
        #properList_update_wZeroes.extend(freqPointsManAllDevices[man][yr])
        #properList_manu_wZeroes.extend([man for _ in range(len(freqPointsManAllDevices[man][yr]))]) # +" period"  
        
        # For the max value
        if len(freqPointsManAllDevices[man][yr])>0:
            properList_update_wZeroes.append(statistics.median(freqPointsManAllDevices[man][yr]))
            properList_manu_wZeroes.append(man)
        
    total_updates_wZeroes[yr]=properList_update_wZeroes.copy()
    total_man_wZeroes[yr]=properList_manu_wZeroes.copy()
    
for yr in range(int(12/howManyMonths)*4): # per each year
    formatGood="th"
    if yr==0:
        formatGood="st"
    elif yr==1:
        formatGood="nd"
    elif yr==2:
        formatGood="rd"
        
#     print(yr)
#     print(total_man_wZeroes[yr])
#     print(total_updates_wZeroes[yr])
    data.append(go.Bar(
            x=total_man_wZeroes[yr], # needs to be each manufacturer
            y=total_updates_wZeroes[yr],
            name=str(yr+1)+formatGood,
            #marker=dict(color='rgb'+str(lvlColor[man]))
        ))
    
# Edit the layout
colorway=['rgb(60, 180, 75)', 'rgb(230, 25, 75)', 'rgb(0, 130, 200)', 'rgb(245, 130, 48)', 'rgb(145, 30, 180)', 'rgb(240, 50, 230)', 'rgb(0, 128, 128)', 'rgb(170, 110, 40)', 'rgb(0, 130, 200)', 'rgb(128, 128, 128)','rgb(128, 0, 0)', 'rgb(170, 255, 195)', 'rgb(128, 128, 0)', 'rgb(255, 215, 180)', 'rgb(0, 0, 128)', 'rgb(128, 128, 128)', 'rgb(255, 255, 255)','rgb(230, 25, 75)',  'rgb(0, 0, 0)','rgb(255, 225, 25)',  'rgb(255, 250, 200)', 'rgb(250, 190, 190)']
layout = dict(#title = "Number of True Updates per 6 Month Period after Release per Manufacture ",
              yaxis = dict(title = 'Median Number of Updates',showgrid=True, gridcolor='rgb(219, 219, 219)'),
              #xaxis = dict(title = '6 Month Period',showgrid=True,zeroline=True,showline=True,zerolinecolor='#969696'),
              plot_bgcolor='rgba(0,0,0,0)',
              boxmode='group',
              colorway=colorway,
              legend_orientation="h",
              legend=dict(x=0.24, y=1.15), # for pt .33
              font=dict(size=20)
              )

fig = dict(data=data, layout=layout)
py.offline.iplot(fig, filename='release-years-frequency-per-carrier.png')


if not "Verizon" in carrierDF.Carrier.values and saveFigs==True: # only save if no Verizon
    pio.write_image(fig, 'manufacture-freq-shared-median.pdf', width=1100, height=600)
data=[]
total_updates={}
total_years={}
total_updates_wZeroes={}
total_man_wZeroes={}


normedMan=[]
for man in carrierDF.Manufacture.unique():
    temp=carrierDF[carrierDF.Manufacture==man]
    if len(temp.Carrier.unique())>2: # meaning the 3 we care about exists
        normedMan.append(man)
importantMan=list(set(normedMan)) # get a list of important manufacturers
    
# sort important manufactures based on number of updates
manLengths=[]
for man in importantMan:
    manLengths.append(len(carrierDF[carrierDF.Manufacture==man]))
importantMan = [x for _,x in sorted(zip(manLengths,importantMan),reverse=True)]
   

for yr in range(int(12/howManyMonths)*4):# for each year range,
    properList_update_wZeroes=[]
    properList_manu_wZeroes=[]
    for man in importantMan:# for each manufacturer
        properList_update_wZeroes.extend(freqPointsManAllDevices[man][yr])
        properList_manu_wZeroes.extend([man for _ in range(len(freqPointsManAllDevices[man][yr]))]) # +" period"   
    total_updates_wZeroes[yr]=properList_update_wZeroes.copy()
    total_man_wZeroes[yr]=properList_manu_wZeroes.copy()
    
for yr in range(int(12/howManyMonths)*4): # per each year
    formatGood="th"
    if yr==0:
        formatGood="st"
    elif yr==1:
        formatGood="nd"
    elif yr==2:
        formatGood="rd"
        
    data.append(go.Box(
            x=total_man_wZeroes[yr], # needs to be each manufacturer
            y=total_updates_wZeroes[yr],
            name=str(yr+1)+formatGood,
            #marker=dict(color='rgb'+str(lvlColor[man]))
        ))
    
# Edit the layout
colorway=['rgb(60, 180, 75)', 'rgb(230, 25, 75)', 'rgb(0, 130, 200)', 'rgb(245, 130, 48)', 'rgb(145, 30, 180)', 'rgb(240, 50, 230)', 'rgb(0, 128, 128)', 'rgb(170, 110, 40)', 'rgb(0, 130, 200)', 'rgb(128, 128, 128)','rgb(128, 0, 0)', 'rgb(170, 255, 195)', 'rgb(128, 128, 0)', 'rgb(255, 215, 180)', 'rgb(0, 0, 128)', 'rgb(128, 128, 128)', 'rgb(255, 255, 255)','rgb(230, 25, 75)',  'rgb(0, 0, 0)','rgb(255, 225, 25)',  'rgb(255, 250, 200)', 'rgb(250, 190, 190)']
layout = dict(#title = "Number of True Updates per 6 Month Period after Release per Manufacture ",
              yaxis = dict(title = 'Number of Updates',showgrid=True, gridcolor='rgb(219, 219, 219)'),
              #xaxis = dict(title = '6 Month Period',showgrid=True,zeroline=True,showline=True,zerolinecolor='#969696'),
              plot_bgcolor='rgba(0,0,0,0)',
              boxmode='group',
              colorway=colorway,
              legend_orientation="h",
              legend=dict(x=0.24, y=1.15), # for pt .33
              font=dict(size=20)
              )

fig = dict(data=data, layout=layout)
py.offline.iplot(fig, filename='release-years-frequency-per-carrier.png')

print("REMEMBER TO TAKE OUT VERIZON")
print("REMEMBER TO TAKE OUT VERIZON")
print("REMEMBER TO TAKE OUT VERIZON")

if not "Verizon" in carrierDF.Carrier.values and saveFigs==True: # only save if no Verizon
    pio.write_image(fig, 'manufacture-freq-shared.pdf', width=1100, height=600)
data=[]
total_updates={}
total_years={}
total_updates_wZeroes={}
total_man_wZeroes={}


normedMan=[]
for man in carrierDF.Manufacture.unique():
    temp=carrierDF[carrierDF.Manufacture==man]
    if len(temp.Carrier.unique())>2: # meaning the 3 we care about exists
        normedMan.append(man)
importantMan=list(set(normedMan)) # get a list of important manufacturers
    
# sort important manufactures based on number of updates
manLengths=[]
for man in importantMan:
    manLengths.append(len(carrierDF[carrierDF.Manufacture==man]))
importantMan = [x for _,x in sorted(zip(manLengths,importantMan),reverse=True)]

    
for yr in range(int(12/howManyMonths)*4):# for each year range,
    properList_update_wZeroes=[]
    properList_manu_wZeroes=[]
    for man in importantMan:# for each manufacturer
        properList_update_wZeroes.extend(freqPointsManAllDevices[man][yr])
        properList_manu_wZeroes.extend([man for _ in range(len(freqPointsManAllDevices[man][yr]))]) # +" period"   
    total_updates_wZeroes[yr]=properList_update_wZeroes.copy()
    total_man_wZeroes[yr]=properList_manu_wZeroes.copy()

    
eaMan_x=[]
eaMan_y=[]
for yr in range(int(12/howManyMonths)*4): # per each year
    eaMan_x.extend(total_man_wZeroes[yr])
    eaMan_y.extend(total_updates_wZeroes[yr])
    
  
    
        
data.append(go.Box(
        x=eaMan_x, # needs to be each manufacturer
        y=eaMan_y,
        marker=dict(color="rgb(31, 119, 180)")
    ))
    
# Edit the layout
colorway=['rgb(60, 180, 75)', 'rgb(230, 25, 75)', 'rgb(0, 130, 200)', 'rgb(245, 130, 48)', 'rgb(145, 30, 180)', 'rgb(240, 50, 230)', 'rgb(0, 128, 128)', 'rgb(170, 110, 40)', 'rgb(0, 130, 200)', 'rgb(128, 128, 128)','rgb(128, 0, 0)', 'rgb(170, 255, 195)', 'rgb(128, 128, 0)', 'rgb(255, 215, 180)', 'rgb(0, 0, 128)', 'rgb(128, 128, 128)', 'rgb(255, 255, 255)','rgb(230, 25, 75)',  'rgb(0, 0, 0)','rgb(255, 225, 25)',  'rgb(255, 250, 200)', 'rgb(250, 190, 190)']
layout = dict(#title = "Number of True Updates per 6 Month Period after Release per Manufacture ",
              yaxis = dict(title = 'Updates per 6-Months',showgrid=True, gridcolor='rgb(219, 219, 219)'),
              #xaxis = dict(title = '6 Month Period',showgrid=True,zeroline=True,showline=True,zerolinecolor='#969696'),
              plot_bgcolor='rgba(0,0,0,0)',
              boxmode='group',
              colorway=colorway,
              legend_orientation="h",
              legend=dict(x=0.24, y=1.15), # for pt .33
              font=dict(size=20)
              )

fig = dict(data=data, layout=layout)
py.offline.iplot(fig, filename='release-years-frequency-per-carrier.png')

print("REMEMBER TO TAKE OUT VERIZON")
print("REMEMBER TO TAKE OUT VERIZON")
print("REMEMBER TO TAKE OUT VERIZON")

if not "Verizon" in carrierDF.Carrier.values and saveFigs==True: # only save if no Verizon
    pio.write_image(fig, 'manufacture-freq-perman.pdf', width=1100, height=600)
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值