I'm new to Python and am still trying to tear myself away from C++ coding techniques while in Python, so please forgive me if this is a trivial question. I can't seem to find the most Pythonic way of doing this.
I have two lists of dicts. The individual dicts in both lists may contain nested dicts. (It's actually some Yelp data, if you're curious.) The first list of dicts contains entries like this:
{business_id': 'JwUE5GmEO-sH1FuwJgKBlQ',
'categories': ['Restaurants'],
'type': 'business'
...}
The second list of dicts contains entries like this:
{'business_id': 'vcNAWiLM4dR7D2nwwJ7nCA',
'date': '2010-03-22',
'review_id': 'RF6UnRTtG7tWMcrO2GEoAg',
'stars': 2,
'text': "This is a basic review",
...}
What I would like to do is extract all the entries in the second list that match specific categories in the first list. For example, if I'm interested in restaurants, I only want the entires in the second list where the business_id matches the business_id in the first list and the word Restaurants appears in the list of values for categories.
If I had these two lists as tables in SQL, I'd do a join on the business_id attribute then just a simple filter to get the rows I want (where Restaurants IN categories, or something similar).
These two lists are extremely large, so I'm running into both efficiency and memory space issues. Before I go and shove all of this into a SQL database, can anyone give me some pointers? I've messed around with Pandas some, so I do have some limited experience with that. I was having trouble with the merge process.
解决方案
Suppose your lists are called l1 and l2:
All elements from l1:
[each for each in l1]
All elements from l1 with the Restaurant category:
[each for each in l1
if 'Restaurants' in each['categories']]
All elements from l2 matching id with elements from l1 with the Restaurant category:
[x for each in l1 for x in l2
if 'Restaurants' in each['categories']
and x['business_id'] == each['business_id'] ]