1. 任务描述
在指定网站中获取公司人员信息,并将数据写入到JSON文件中。
2. 核心技术点
-
使用 Jsoup 实现网页页面元素的选取
-
使用alibaba的fastjson实现JSON文件的读写操作
-
使用递归将公司、部门、人员的信息分别记录下来
-
使用反射机制读取JSON文件
3. 实现思路
(1)组织架构类
分别创建Company、Department、Employee类以保存各个层级的架构信息,其中Company、Department 类实现了GetBranches接口,用以获取当前类的分支,对于Company类来说,则是获取隶属于当前Company 的Department。如下为Company类示例代码:
public class Company implements GetBranches<Department> {
private String name;
private String id;
private List<Department> departments;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public List<Department> getDepartments() {
return departments;
}
public void setDepartments(List<Department> departments) {
this.departments = departments;
}
public List<Department> getBranch() {
return this.departments;
}
@Override
public List<Department> getBranches(String url) throws IOException {
List<Department> departmentList = new ArrayList<>();
Document document = Jsoup.connect(url).get();
Elements elements = document.getElementsByClass("xwtable").select("tr");
for (int i = 1; i < elements.size(); i++) {
Element element = elements.get(i);
Department department = new Department();
department.setName(element.select("td").get(1).text());
department.setId(element.select("span").text());
String employeeUrl = element.select("td").get(0).select("a").attr("href");
department.setEmployees(department.getBranches(employeeUrl));
departmentList.add(department);
}
return departmentList;
}
@Override
public String toString() {
return "company{" +
"id='" + id + '\'' +
", name='" + name + '\'' +
", departments=" + departments +
'}';
}
}
(2)数据写入、读取工具类
getDocFromUrl方法将从网页中获取到的数据写入到JSON文件中,getInfoFromJson方法将数据从JSON文件中 提取到组织架构类的实例中,并根据用户需要输出相应的查询数据,代码如下:
public class GetInfo {
public static void getDocFromUrl(String url, File file) throws IOException {
Document document = Jsoup.connect(url).get();
Elements elements = document.getElementsByClass("xwtable").select("tr");
List<Company> companyList = new ArrayList<>();
for (int i = 1; i < elements.size(); i++) {
Company company = new Company();
Element element = elements.get(i);
company.setName(element.select("td").get(1).text());
company.setId(element.select("span").text());
String departmentUrl = element.select("td").get(0).select("a").attr("href");
company.setDepartments(company.getBranches(departmentUrl));
companyList.add(company);
}
FileOutputStream fileOutputStream = new FileOutputStream(file, false);
fileOutputStream.write(JSONObject.toJSONString(companyList).getBytes());
fileOutputStream.close();
}
public static void getInfoFromJson(List<?> list, String str) throws Exception {
Field field = list.get(0).getClass().getDeclaredField("id");
Method method = list.get(0).getClass().getDeclaredMethod("toString");
field.setAccessible(true);
boolean flag = false;
for (Object obj : list) {
String id = (String) field.get(obj);
if (str.equals(id)) {
System.out.println(method.invoke(obj));
flag = true;
break;
} else if (str.startsWith(id)) {
Method getBranch = obj.getClass().getDeclaredMethod("getBranch");
List<?> branch = (List<?>) getBranch.invoke(obj);
getInfoFromJson(branch, str);
flag = true;
break;
}
}
if (!flag) {
System.out.println("Wrong id!");
}
}
}
(3)主类
用于设置网页URL,并且对用户输入进行相应的处理。
public class DoWorm {
public static void main(String[] args) throws Exception {
File file = new File("./result.json");
if (!file.exists()) {
String url = "http://localhost:8080/company";
GetInfo.getDocFromUrl(url, file);
}
FileInputStream fileInputStream = new FileInputStream(file);
long length = file.length();
byte[] bytes = new byte[(int) length];
while (true) {
if (-1 == fileInputStream.read(bytes)) {
break;
}
}
String jsonText = new String(bytes);
List<Company> companyList = JSON.parseArray(jsonText, Company.class);
Scanner sc = new Scanner(System.in);
while (true) {
String str = sc.nextLine();
if (!"-1".equals(str)) {
GetInfo.getInfoFromJson(companyList, str);
} else {
break;
}
}
}
}