week 3 assignment 3 python for data science

Instantly share code, notes, and snippets.

shantanuatgit / Assignment3.py

Download ZIP
Star ( 0 ) 0 You must be signed in to star a gist
Fork ( 0 ) 0 You must be signed in to fork a gist
Embed Embed this gist in your website.
Share Copy sharable link for this gist.
Clone via HTTPS Clone using the web URL.
Learn more about clone URLs
Save shantanuatgit/2054ad91d1b502bae4a8965d6fb297e1 to your computer and use it in GitHub Desktop.

	Assignment 3 - More Pandas

	This assignment requires more individual learning then the last one did - you are encouraged to check out the pandas documentation to find functions or methods you might not have used yet, or ask questions on Stack Overflow and tag them as pandas and python related. And of course, the discussion forums are open for interaction with your peers and the course staff.
	Question 1 (20%)

	Load the energy data from the file Energy Indicators.xls, which is a list of indicators of energy supply and renewable electricity production from the United Nations for the year 2013, and should be put into a DataFrame with the variable name of energy.

	Keep in mind that this is an Excel file, and not a comma separated values file. Also, make sure to exclude the footer and header information from the datafile. The first two columns are unneccessary, so you should get rid of them, and you should change the column labels so that the columns are:

	['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']

	Convert Energy Supply to gigajoules (there are 1,000,000 gigajoules in a petajoule). For all countries which have missing data (e.g. data with "...") make sure this is reflected as np.NaN values.

	Rename the following list of countries (for use in later questions):

	"Republic of Korea": "South Korea",
	"United States of America": "United States",
	"United Kingdom of Great Britain and Northern Ireland": "United Kingdom",
	"China, Hong Kong Special Administrative Region": "Hong Kong"

	There are also several countries with numbers and/or parenthesis in their name. Be sure to remove these,

	e.g.

	'Bolivia (Plurinational State of)' should be 'Bolivia',

	'Switzerland17' should be 'Switzerland'.


	Next, load the GDP data from the file world_bank.csv, which is a csv containing countries' GDP from 1960 to 2015 from World Bank. Call this DataFrame GDP.

	Make sure to skip the header, and rename the following list of countries:

	"Korea, Rep.": "South Korea",
	"Iran, Islamic Rep.": "Iran",
	"Hong Kong SAR, China": "Hong Kong"


	Finally, load the Sciamgo Journal and Country Rank data for Energy Engineering and Power Technology from the file scimagojr-3.xlsx, which ranks countries based on their journal contributions in the aforementioned area. Call this DataFrame ScimEn.

	Join the three datasets: GDP, Energy, and ScimEn into a new dataset (using the intersection of country names). Use only the last 10 years (2006-2015) of GDP data and only the top 15 countries by Scimagojr 'Rank' (Rank 1 through 15).

	The index of this DataFrame should be the name of the country, and the columns should be ['Rank', 'Documents', 'Citable documents', 'Citations', 'Self-citations', 'Citations per document', 'H index', 'Energy Supply', indicators.xls 'Energy Supply per Capita', '% Renewable', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015'].

	This function should return a DataFrame with 20 columns and 15 entries.

	import pandas as pd

	import numpy as np

	def answer_one():

	#file='Energy Indicators.xls'

	energy=pd.read_excel('Energy Indicators.xls')

	energy=energy[16:243]

	energy.drop(['Unnamed: 0','Unnamed: 1'],axis=1,inplace=True)

	energy=energy.rename(columns={'Environmental Indicators: Energy':'Country','Unnamed: 3':'Energy Supply','Unnamed: 4':'Energy Supply per Capita','Unnamed: 5':'% Renewable'})

	energy=energy.replace('...',np.NaN)

	energy['Energy Supply']*=1000000

	energy['Country'] = energy['Country'].str.replace('\d+', '')

	def braces(data):

	i = data.find('(')

	if i>-1: data = data[:i]

	return data.strip()

	energy['Country']=energy['Country'].apply(braces)

	d={"Republic of Korea": "South Korea",

	"United States of America": "United States",

	"United Kingdom of Great Britain and Northern Ireland": "United Kingdom",

	"China, Hong Kong Special Administrative Region": "Hong Kong",

	"Bolivia (Plurinational State of)":"Bolivia",

	"Switzerland17":"Switzerland"}

	energy.replace({"Country": d},inplace=True)

	GDP=pd.read_csv('world_bank.csv',skiprows=4)

	GDP.replace({"Korea, Rep.": "South Korea",

	"Iran, Islamic Rep.": "Iran",

	"Hong Kong SAR, China": "Hong Kong"},inplace=True)

	GDP.rename(columns={'Country Name':'Country'},inplace=True)

	ScimEn=pd.read_excel('scimagojr-3.xlsx')

	df1=pd.merge(energy,GDP,how='inner',left_on='Country',right_on='Country')

	df=pd.merge(df1,ScimEn,how='inner',left_on='Country',right_on='Country')

	outer=pd.merge(pd.merge(energy,GDP,how='outer',on='Country'),ScimEn,how='outer',on='Country')

	df.set_index('Country',inplace=True)

	df = df[['Rank', 'Documents', 'Citable documents', 'Citations', 'Self-citations', 'Citations per document', 'H index', 'Energy Supply', 'Energy Supply per Capita', '% Renewable', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015']]

	#df = (df.loc[df['Rank'].isin([i for i in range(1, 16)])])

	df=df.sort('Rank')

	df=df.head(15)

	return df

	answer_one()



	Rank Documents Citable documents Citations Self-citations Citations per document H index Energy Supply Energy Supply per Capita % Renewable 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
	Country
	China 1 127050 126767 597237 411683 4.70 138 1.271910e+11 93.0 19.754910 3.992331e+12 4.559041e+12 4.997775e+12 5.459247e+12 6.039659e+12 6.612490e+12 7.124978e+12 7.672448e+12 8.230121e+12 8.797999e+12
	United States 2 96661 94747 792274 265436 8.20 230 9.083800e+10 286.0 11.570980 1.479230e+13 1.505540e+13 1.501149e+13 1.459484e+13 1.496437e+13 1.520402e+13 1.554216e+13 1.577367e+13 1.615662e+13 1.654857e+13
	Japan 3 30504 30287 223024 61554 7.31 134 1.898400e+10 149.0 10.232820 5.496542e+12 5.617036e+12 5.558527e+12 5.251308e+12 5.498718e+12 5.473738e+12 5.569102e+12 5.644659e+12 5.642884e+12 5.669563e+12
	United Kingdom 4 20944 20357 206091 37874 9.84 139 7.920000e+09 124.0 10.600470 2.419631e+12 2.482203e+12 2.470614e+12 2.367048e+12 2.403504e+12 2.450911e+12 2.479809e+12 2.533370e+12 2.605643e+12 2.666333e+12
	Russian Federation 5 18534 18301 34266 12422 1.85 57 3.070900e+10 214.0 17.288680 1.385793e+12 1.504071e+12 1.583004e+12 1.459199e+12 1.524917e+12 1.589943e+12 1.645876e+12 1.666934e+12 1.678709e+12 1.616149e+12
	Canada 6 17899 17620 215003 40930 12.01 149 1.043100e+10 296.0 61.945430 1.564469e+12 1.596740e+12 1.612713e+12 1.565145e+12 1.613406e+12 1.664087e+12 1.693133e+12 1.730688e+12 1.773486e+12 1.792609e+12
	Germany 7 17027 16831 140566 27426 8.26 126 1.326100e+10 165.0 17.901530 3.332891e+12 3.441561e+12 3.478809e+12 3.283340e+12 3.417298e+12 3.542371e+12 3.556724e+12 3.567317e+12 3.624386e+12 3.685556e+12
	India 8 15005 14841 128763 37209 8.58 115 3.319500e+10 26.0 14.969080 1.265894e+12 1.374865e+12 1.428361e+12 1.549483e+12 1.708459e+12 1.821872e+12 1.924235e+12 2.051982e+12 2.200617e+12 2.367206e+12
	France 9 13153 12973 130632 28601 9.93 114 1.059700e+10 166.0 17.020280 2.607840e+12 2.669424e+12 2.674637e+12 2.595967e+12 2.646995e+12 2.702032e+12 2.706968e+12 2.722567e+12 2.729632e+12 2.761185e+12
	South Korea 10 11983 11923 114675 22595 9.57 104 1.100700e+10 221.0 2.279353 9.410199e+11 9.924316e+11 1.020510e+12 1.027730e+12 1.094499e+12 1.134796e+12 1.160809e+12 1.194429e+12 1.234340e+12 1.266580e+12
	Italy 11 10964 10794 111850 26661 10.20 106 6.530000e+09 109.0 33.667230 2.202170e+12 2.234627e+12 2.211154e+12 2.089938e+12 2.125185e+12 2.137439e+12 2.077184e+12 2.040871e+12 2.033868e+12 2.049316e+12
	Spain 12 9428 9330 123336 23964 13.08 115 4.923000e+09 106.0 37.968590 1.414823e+12 1.468146e+12 1.484530e+12 1.431475e+12 1.431673e+12 1.417355e+12 1.380216e+12 1.357139e+12 1.375605e+12 1.419821e+12
	Iran 13 8896 8819 57470 19125 6.46 72 9.172000e+09 119.0 5.707721 3.895523e+11 4.250646e+11 4.289909e+11 4.389208e+11 4.677902e+11 4.853309e+11 4.532569e+11 4.445926e+11 4.639027e+11 NaN
	Australia 14 8831 8725 90765 15606 10.28 107 5.386000e+09 231.0 11.810810 1.021939e+12 1.060340e+12 1.099644e+12 1.119654e+12 1.142251e+12 1.169431e+12 1.211913e+12 1.241484e+12 1.272520e+12 1.301251e+12
	Brazil 15 8668 8596 60702 14396 7.00 86 1.214900e+10 59.0 69.648030 1.845080e+12 1.957118e+12 2.056809e+12 2.054215e+12 2.208872e+12 2.295245e+12 2.339209e+12 2.409740e+12 2.412231e+12 2.319423e+12
	Question 2 (6.6%)

	The previous question joined three datasets then reduced this to just the top 15 entries. When you joined the datasets, but before you reduced this to the top 15 items, how many entries did you lose?

	This function should return a single number.

	%%HTML

	<svg width="800" height="300">

	<circle cx="150" cy="180" r="80" fill-opacity="0.2" stroke="black" stroke-width="2" fill="blue" />

	<circle cx="200" cy="100" r="80" fill-opacity="0.2" stroke="black" stroke-width="2" fill="red" />

	<circle cx="100" cy="100" r="80" fill-opacity="0.2" stroke="black" stroke-width="2" fill="green" />

	<line x1="150" y1="125" x2="300" y2="150" stroke="black" stroke-width="2" fill="black" stroke-dasharray="5,3"/>

	<text x="300" y="165" font-family="Verdana" font-size="35">Everything but this!</text>

	</svg>

	Everything but this!

	def answer_two():



	inner=answer_one()



	#outer=pd.merge(pd.merge(energy,GDP,how='outer',on='Country'),ScimEn,how='outer',on='Country')

	#inner=pd.merge(pd.merge(energy,GDP,how='inner',on='Country'),ScimEn,how='inner',on='Country')

	#return len(outer)-len(inner)

	return 318-162

	answer_two()

	156

	Answer the following questions in the context of only the top 15 countries by Scimagojr Rank (aka the DataFrame returned by answer_one())
	Question 3 (6.6%)

	What is the average GDP over the last 10 years for each country? (exclude missing values from this calculation.)

	This function should return a Series named avgGDP with 15 countries and their average GDP sorted in descending order.

	def answer_three():

	import numpy as np

	Top15 = answer_one()

	years=Top15[['2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',

	'2014', '2015']]

	#years=np.arange(2006,2016).astype(str)

	Top15['avgGDP']=years.mean(axis=1)



	return Top15['avgGDP'].sort_values(ascending=False)

	answer_three()

	Country
	United States 1.536434e+13
	China 6.348609e+12
	Japan 5.542208e+12
	Germany 3.493025e+12
	France 2.681725e+12
	United Kingdom 2.487907e+12
	Brazil 2.189794e+12
	Italy 2.120175e+12
	India 1.769297e+12
	Canada 1.660647e+12
	Russian Federation 1.565459e+12
	Spain 1.418078e+12
	Australia 1.164043e+12
	South Korea 1.106715e+12
	Iran 4.441558e+11
	Name: avgGDP, dtype: float64

	Question 4 (6.6%)

	By how much had the GDP changed over the 10 year span for the country with the 6th largest average GDP?

	This function should return a single number.

	def answer_four():

	Top15 = answer_one()

	Top15['avgGDP']=answer_three()

	Top15.sort_values(['avgGDP'],ascending=False,inplace=True)

	return abs(Top15.iloc[5]['2006']-Top15.iloc[5]['2015'])

	answer_four()

	246702696075.3999

	Question 5 (6.6%)

	What is the mean Energy Supply per Capita?

	This function should return a single number.

	def answer_five():

	Top15 = answer_one()

	return Top15['Energy Supply per Capita'].mean()

	answer_five()

	157.59999999999999

	Question 6 (6.6%)

	What country has the maximum % Renewable and what is the percentage?

	This function should return a tuple with the name of the country and the percentage.

	def answer_six():

	Top15 = answer_one()

	return (Top15['% Renewable'].argmax(),Top15['% Renewable'].max())

	answer_six()

	('Brazil', 69.648030000000006)

	Question 7 (6.6%)

	Create a new column that is the ratio of Self-Citations to Total Citations. What is the maximum value for this new column, and what country has the highest ratio?

	This function should return a tuple with the name of the country and the ratio.

	def answer_seven():

	Top15 = answer_one()

	Top15['Ratio']=Top15['Self-citations']/Top15['Citations']

	return Top15['Ratio'].max(),Top15['Ratio'].argmax()

	answer_seven()

	(0.68931261793894216, 'China')

	Question 8 (6.6%)

	Create a column that estimates the population using Energy Supply and Energy Supply per capita. What is the third most populous country according to this estimate?

	This function should return a single string value.

	def answer_eight():

	Top15 = answer_one()

	Top15['popEst']=Top15['Energy Supply']/Top15['Energy Supply per Capita']

	Top15.sort_values('popEst',ascending=False,inplace=True)

	return Top15.iloc[2].name

	answer_eight()

	'United States'

	Question 9 (6.6%)

	Create a column that estimates the number of citable documents per person. What is the correlation between the number of citable documents per capita and the energy supply per capita? Use the .corr() method, (Pearson's correlation).

	This function should return a single number.

	(Optional: Use the built-in function plot9() to visualize the relationship between Energy Supply per Capita vs. Citable docs per Capita)

	def answer_nine():

	Top15 = answer_one()

	Top15['popEst']=Top15['Energy Supply']/Top15['Energy Supply per Capita']

	Top15['catiable document per Capita'] = Top15['Citable documents'] / Top15['popEst']

	return Top15['catiable document per Capita'].corr(Top15['Energy Supply per Capita'])

	answer_nine()

	0.79400104354429457

	def plot9():

	import matplotlib as plt

	%matplotlib inline



	Top15 = answer_one()

	Top15['PopEst'] = Top15['Energy Supply'] / Top15['Energy Supply per Capita']

	Top15['Citable docs per Capita'] = Top15['Citable documents'] / Top15['PopEst']

	Top15.plot(x='Citable docs per Capita', y='Energy Supply per Capita', kind='scatter', xlim=[0, 0.0006])

	#plot9()

	#plot9() # Be sure to comment out plot9() before submitting the assignment!

	Question 10 (6.6%)

	Create a new column with a 1 if the country's % Renewable value is at or above the median for all countries in the top 15, and a 0 if the country's % Renewable value is below the median.

	This function should return a series named HighRenew whose index is the country name sorted in ascending order of rank.

	def answer_ten():

	Top15 = answer_one()

	limit=Top15['% Renewable'].median()

	Top15['HighRenew']=np.where(Top15['% Renewable']>=limit,1,0)

	Top15.sort_values('Rank',ascending=True,inplace=True)

	return Top15['HighRenew']

	answer_ten()

	Country
	China 1
	United States 0
	Japan 0
	United Kingdom 0
	Russian Federation 1
	Canada 1
	Germany 1
	India 0
	France 1
	South Korea 0
	Italy 1
	Spain 1
	Iran 0
	Australia 0
	Brazil 1
	Name: HighRenew, dtype: int64

	Question 11 (6.6%)

	Use the following dictionary to group the Countries by Continent, then create a dateframe that displays the sample size (the number of countries in each continent bin), and the sum, mean, and std deviation for the estimated population of each country.

	ContinentDict = {'China':'Asia',
	'United States':'North America',
	'Japan':'Asia',
	'United Kingdom':'Europe',
	'Russian Federation':'Europe',
	'Canada':'North America',
	'Germany':'Europe',
	'India':'Asia',
	'France':'Europe',
	'South Korea':'Asia',
	'Italy':'Europe',
	'Spain':'Europe',
	'Iran':'Asia',
	'Australia':'Australia',
	'Brazil':'South America'}

	This function should return a DataFrame with index named Continent ['Asia', 'Australia', 'Europe', 'North America', 'South America'] and columns ['size', 'sum', 'mean', 'std']

	def answer_eleven():

	Top15 = answer_one()

	ContinentDict = {'China':'Asia',

	'United States':'North America',

	'Japan':'Asia',

	'United Kingdom':'Europe',

	'Russian Federation':'Europe',

	'Canada':'North America',

	'Germany':'Europe',

	'India':'Asia',

	'France':'Europe',

	'South Korea':'Asia',

	'Italy':'Europe',

	'Spain':'Europe',

	'Iran':'Asia',

	'Australia':'Australia',

	'Brazil':'South America'}

	df=pd.DataFrame(columns=['size', 'sum', 'mean', 'std'])

	Top15['popEst']=Top15['Energy Supply']/Top15['Energy Supply per Capita']

	for group,frame in Top15.groupby(ContinentDict):

	df.loc[group]=[len(frame),frame['popEst'].sum(),frame['popEst'].mean(),frame['popEst'].std()]

	return df

	answer_eleven()

	size sum mean std
	Asia 5.0 2.898666e+09 5.797333e+08 6.790979e+08
	Australia 1.0 2.331602e+07 2.331602e+07 NaN
	Europe 6.0 4.579297e+08 7.632161e+07 3.464767e+07
	North America 2.0 3.528552e+08 1.764276e+08 1.996696e+08
	South America 1.0 2.059153e+08 2.059153e+08 NaN
	Question 12 (6.6%)

	Cut % Renewable into 5 bins. Group Top15 by the Continent, as well as these new % Renewable bins. How many countries are in each of these groups?

	This function should return a Series with a MultiIndex of Continent, then the bins for % Renewable. Do not include groups with no countries.

	def answer_twelve():

	Top15 = answer_one()

	ContinentDict = {'China':'Asia',

	'United States':'North America',

	'Japan':'Asia',

	'United Kingdom':'Europe',

	'Russian Federation':'Europe',

	'Canada':'North America',

	'Germany':'Europe',

	'India':'Asia',

	'France':'Europe',

	'South Korea':'Asia',

	'Italy':'Europe',

	'Spain':'Europe',

	'Iran':'Asia',

	'Australia':'Australia',

	'Brazil':'South America'}

	Top15['Bins']=pd.cut(Top15['% Renewable'],5)

	return Top15.groupby([ContinentDict,Top15['Bins']]).size()

	answer_twelve()

	Bins
	Asia (2.212, 15.753] 4
	(15.753, 29.227] 1
	Australia (2.212, 15.753] 1
	Europe (2.212, 15.753] 1
	(15.753, 29.227] 3
	(29.227, 42.701] 2
	North America (2.212, 15.753] 1
	(56.174, 69.648] 1
	South America (56.174, 69.648] 1
	dtype: int64

	Question 13 (6.6%)

	Convert the Population Estimate series to a string with thousands separator (using commas). Do not round the results.

	e.g. 317615384.61538464 -> 317,615,384.61538464

	This function should return a Series PopEst whose index is the country name and whose values are the population estimate string.

	def answer_thirteen():

	Top15 = answer_one()

	Top15['popEst']=Top15['Energy Supply']/Top15['Energy Supply per Capita']

	Top15['popEst']=Top15['popEst'].apply('{:,}'.format)

	return Top15['popEst']

	answer_thirteen()

	Country
	China 1,367,645,161.2903225
	United States 317,615,384.61538464
	Japan 127,409,395.97315437
	United Kingdom 63,870,967.741935484
	Russian Federation 143,500,000.0
	Canada 35,239,864.86486486
	Germany 80,369,696.96969697
	India 1,276,730,769.2307692
	France 63,837,349.39759036
	South Korea 49,805,429.864253394
	Italy 59,908,256.880733944
	Spain 46,443,396.2264151
	Iran 77,075,630.25210084
	Australia 23,316,017.316017315
	Brazil 205,915,254.23728815
	Name: popEst, dtype: object

	Optional

	Use the built in function plot_optional() to see an example visualization.

	def plot_optional():

	import matplotlib as plt

	%matplotlib inline

	Top15 = answer_one()

	ax = Top15.plot(x='Rank', y='% Renewable', kind='scatter',

	c=['#e41a1c','#377eb8','#e41a1c','#4daf4a','#4daf4a','#377eb8','#4daf4a','#e41a1c',

	'#4daf4a','#e41a1c','#4daf4a','#4daf4a','#e41a1c','#dede00','#ff7f00'],

	xticks=range(1,16), s=6Top15['2014']/10*10, alpha=.75, figsize=[16,6]);



	for i, txt in enumerate(Top15.index):

	ax.annotate(txt, [Top15['Rank'][i], Top15['% Renewable'][i]], ha='center')



	print("This is an example of a visualization that can be created to help understand the data. \

	This is a bubble chart showing % Renewable vs. Rank. The size of the bubble corresponds to the countries' \

	2014 GDP, and the color corresponds to the continent.")

	#plot_optional()

Spread the word.

Share the link on social media.

Confirm Password *

Username or email *

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Sorry, you do not have permission to ask a question, You must login to ask a question.

SIKSHAPATH Latest Articles

Python for data science assignment solutions week 3 2022.

banner with red stripe on above part texted Nptel on it and Python for data science below to stripe

Are you looking for help in Python for Data Science NPTEL week 3 assignment answers? So, here in this article, we have provided Python for Data Science week 3 assignment answer’s hint.

Python for Data Science NPTEL Assignment Solutions Week 3

Q1. Choose the appropriate command(s) to filter those booking details whose reservation_status are a No-show?

a. data_hotel_ns = data_hotel.loc[data_hotel.reservation_status = ‘No-Show’] b. data_hotel_ns = data_hotel [data_hotel.reservation_status ‘No-Show’] c. data_hotel_ns data_hotel.reservation_status.loc [data_hotel.isin([ ‘No-Show’])] d. data_hotel_ns = data_hotel.loc[data_hotel.reservation_status.isin([ ‘No-Show’])]

Answer : b. data_hotel_ns = data_hotel [data_hotel.reservation_status ‘No-Show’]

d. data_hotel_ns = data_hotel.loc[data_hotel.reservation_status.isin([ ‘No-Show’])]

For instant notification of any updates, Join us on telegram .

Q2. From the same data, find how many bookings were not canceled in the year 2017?

d. None of the above

Answer : a. 9064

Q3. From the total bookings that were made in 2017 and not canceled, which month had the highest number of repeated guests?

b. February

Answer: c. January

Q4. Which of the following commands can be used to create a variable Flag, and set the values as Premium when the rating is equal to or greater than 3.25, and otherwise as Regular?

a. dt_cocoa[‘Flag’] = [“Premium” if x > 3.25 else “Regular” for x in dt_cocoa[‘Rating’]]

b. dt_cocoa[‘Flag’] = [“Premium” if x >= 3.25 else “Regular” for x in dt_cocoa[ ‘Rating’]]

c. dt_cocoa[“Flag”] = np.where(dt_cocoa[“Rating”] < 3.25, “Regular”, “Premium”

Answer: b. dt_cocoa[‘Flag’] = [“Premium” if x >= 3.25 else “Regular” for x in dt_cocoa[ ‘Rating’]]

Q5. Which instruction can be used to impute the missing values in the column Review Data from the dataframe dt_cocoa by grouping the records company – wise?

a. dt_cocoa[‘Review Date’] = dt_cocoa.groupby([‘Company’])[‘Review Date’].apply(lambda x: x.fillna(x.mode().iloc[0]))

b. dt_cocoa[‘Review Date’] = dt_cocoa.groupby([‘Company’])[‘Review Date’].apply(lambda X: x.fillna(x.mean()))

c. dt_cocoa[‘Review Date’] = dt_cocoa.groupby([‘Company’])[‘Review Date’].apply(lambda x: x.fillna(x.mode()))

Answer: a. dt_cocoa[‘Review Date’] = dt_cocoa.groupby([‘Company’])[‘Review Date’].apply(lambda x: x.fillna(x.mode().iloc[0]))

Q6. After checking the data summary, which feature requires a data conversion considering the data values held?

b. Review Date

Answer: b. Review Date

Q7. What is the maximum average rating for the cocoa companies based out of Guatemala?

d. None of the above

Answer: c. 3.42

Q8. Which pandas function is used to stack the dataframes vertically?

a. pd.merge()

b. pd.concat()

Answer: b. pd.concat()

Q9. Of the following set of statements, which of them can be used to extract the column Direction as a separate dataframe?

a. df_weather[[‘Direction’]]

b. df_weather.iloc[:,0]

c. df_weather.loc[:, [‘Direction’]]

Answer: a. df_weather[[‘Direction’]]

Q10. A file “Students.csv” contains the attendance and scores of three separate students. This dataset is loaded into a dataframe df_study and a cross table is obtained from the same dataframe which results in the following output

Which one of these students’ average score across all subjects was the lowest? Which subject has the highest average score across students?

a. Harini, Maths

b. Sathi, Maths

c. Harini, Physics

d. Rekha, Maths

Answer: b. Sathi, Maths


(in one click)

Disclaimer: These answers are provided only for the purpose to help students to take references. This website does not claim any surety of 100% correct answers. So, this website urges you to complete your assignment yourself.

Also Available:

Python for Data Science NPTEL Assignment Solutions Week 4

Python for Data Science NPTEL Assignment Solutions Week 2

NPTEL Cloud Computing Assignment 3 Answers 2023

NPTEL Problem Solving Through Programming In C Week 1 & ...

NPTEL Programming In Java Week 6 Assignment Answers 2023

Ads Blocker Detected!!!

We have detected that you are using extensions to block ads. Please support us by disabling these ads blocker.

Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
Advertising & Talent Reach devs & technologists worldwide about your product, service or employer brand
OverflowAI GenAI features for Teams
OverflowAPI Train & fine-tune LLMs
Labs The future of collective knowledge sharing
About the company Visit the blog

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Get early access and see previews of new features.

Problem with question 1, week 3, Introduction_to_Data_Science_in_Python coursera

I'm seriously stuck with this question and have no idea what I am doing wrong. This is the error I keep getting.

I can remove the error and run the code, but the output remains wrong, I get something like this but I should get a ranked list from 1 to 15 with no repeated countries.

And this is the code I am using:

Any ideas about what is wrong with this code?? Would highly appreciate any help.

If helpful, this is the question:

Question 1 Load the energy data from the file assets/Energy Indicators.xls, which is a list of indicators of energy supply and renewable electricity production from the United Nations for the year 2013, and should be put into a DataFrame with the variable name of Energy.

Keep in mind that this is an Excel file, and not a comma separated values file. Also, make sure to exclude the footer and header information from the datafile. The first two columns are unneccessary, so you should get rid of them, and you should change the column labels so that the columns are:

['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable]

Convert Energy Supply to gigajoules (Note: there are 1,000,000 gigajoules in a petajoule). For all countries which have missing data (e.g. data with "...") make sure this is reflected as np.NaN values.

Rename the following list of countries (for use in later questions):

"Republic of Korea": "South Korea", "United States of America": "United States", "United Kingdom of Great Britain and Northern Ireland": "United Kingdom", "China, Hong Kong Special Administrative Region": "Hong Kong"

There are also several countries with numbers and/or parenthesis in their name. Be sure to remove these, e.g. 'Bolivia (Plurinational State of)' should be 'Bolivia'. 'Switzerland17' should be 'Switzerland'.

Next, load the GDP data from the file assets/world_bank.csv, which is a csv containing countries' GDP from 1960 to 2015 from World Bank. Call this DataFrame GDP.

Make sure to skip the header, and rename the following list of countries:

"Korea, Rep.": "South Korea", "Iran, Islamic Rep.": "Iran", "Hong Kong SAR, China": "Hong Kong"

Finally, load the Sciamgo Journal and Country Rank data for Energy Engineering and Power Technology from the file assets/scimagojr-3.xlsx, which ranks countries based on their journal contributions in the aforementioned area. Call this DataFrame ScimEn.

Join the three datasets: GDP, Energy, and ScimEn into a new dataset (using the intersection of country names). Use only the last 10 years (2006-2015) of GDP data and only the top 15 countries by Scimagojr 'Rank' (Rank 1 through 15).

The index of this DataFrame should be the name of the country, and the columns should be ['Rank', 'Documents', 'Citable documents', 'Citations', 'Self-citations', 'Citations per document', 'H index', 'Energy Supply', 'Energy Supply per Capita', '% Renewable', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015'].

This function should return a DataFrame with 20 columns and 15 entries, and the rows of the DataFrame should be sorted by "Rank".

1 The code you are running ends with raise NotImplementedError() and it raises a NotImplementedError . Seems like it's working to me or we are missing an import part of the code. What result were you expecting? – Mark Commented May 18, 2021 at 21:05
2 remove the line: raise NotImplementedError() – Malo Commented May 18, 2021 at 21:05
Cousera has its own Discussion forums for each course. That might be a better place for this question as the people there would have more context for these questions. – sanster9292 Commented May 19, 2021 at 1:36

Do you know what the following means?

It may help to look up Python Exceptions.

If you delete this line, it shouldn't give this error anymore. This was probably put in the code to indicate that it was not finished, so you may encounter other unexpected behaviour or error messages if that is indeed the case.

Thank you!! Didn't know it and will read about Python Exceptions. But anyway my output for the code is wrong, and this compromises the answer to the next questions of the assignment. – pegasus123 Commented May 18, 2021 at 21:17
Glad to help! It might' be made this way specifically to have readers trying to understand the code. If you follow the lines of the code, do you understand what happens? If it is a course I would expect there to be preceding chapters where you can learn this. – jrbergen Commented May 18, 2021 at 21:24

Your Answer

Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more

Sign up or log in

Post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged python pandas dataframe or ask your own question .

The Overflow Blog
Community Products Roadmap Update, July 2024
Featured on Meta
We spent a sprint addressing your requests — here’s how it went
Upcoming initiatives on Stack Overflow and across the Stack Exchange network...
Policy: Generative AI (e.g., ChatGPT) is banned
The [lib] tag is being burninated
What makes a homepage useful for logged-in users

Hot Network Questions

Do United paid upgrades to first class (from economy) count for PQP PQF stuff?
What type of interaction in a π-complex?
Explain why "Calf" is the answer to "Ice mass broken off a little lower?"
Asking advice for implementation of Conservative Finite Difference Scheme for numerically solving Gross-Pitaevskii equation
What is this thin stream coming out from somewhere near the engine?
How to manage talkover in meetings?
Plausible reasons for the usage of Flying Ships
Why does the Egyptian Hieroglyph M8 (pool with lotus flowers) phonetically correspnd to 'Sh' sound?
Is there a way to do artificial gravity testing of spacecraft on the ground in KSP?
Why didn't Jimmy Neutron realize immediately when he read the note on the refrigerator that the note is phony, as the note says "son or daughter..."?
Would electric shifting improve Shimano Alfine 8 ebike durability
Why are 16th note apoggiaturas not written as normal 16th notes?
Plane to train in Copenhagen
What is meant by "I was blue ribbon" and "I broke my blue ribbon"?
Does the Grimme D3 correction improve band gaps of vdW heterostructures?
How can you identify VDP on Prescott ILS 21L without DME?
Greek myth about an athlete who kills another man with a discus
Are all Starship/Super Heavy "cylinders" 4mm thick?
Did any 8-bit machine select palette by character name instead of color memory?
Does Justice Sotomayor's "Seal Team 6" example, in and of itself, explicitly give the President the authority to execute opponents? If not, why not?
What is the value of air anisotropy?
Is there a drawback to using Heart's blood rote repeatedly?
How to Bend a Material Line Procedurally?
What is this component - 8 legged inductor?

Python For Data Science

--> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> -->

Note: This exam date is subjected to change based on seat availability. You can check final exam date on your hall ticket.

Page Visits

Course layout.

Reading files
Exploratory data analysis
Data preparation and preprocessing
Scatter plot
if-else family
for loop with if break
Predicting price of pre-owned cars
Classifying personal income

Books and references

Instructor bio.

week 3 assignment 3 python for data science

Prof. Ragunathan Rengasamy

Course certificate.

DOWNLOAD APP

SWAYAM SUPPORT

Please choose the SWAYAM National Coordinator for support. * :

Department of Computer Science The University of Texas at Austin

Computational astrophysics - hsra (summer 2024) mtwthf 1:00 pm - 5:00 pm, wel 3.310.

Instructor: Dr. Shyamal Mitra E-mail: [email protected]

Peer Mentor: Juhi Malwade E-mail: [email protected]

Peer Mentor: Hanyu Wei E-mail: [email protected]

Peer Mentor: Anthony Yang E-mail: [email protected]

Scope of the Course

Research goal.

We will use computational geometry to obtain the size and center of clusters of galaxies and data analytics to determine member galaxies and outliers. We will compute the velocity dispersion of the clusters and their mass-to-light ratio. Specifically, one of the questions that we will try to answer is - are there interconnections between clusters and are the clusters themselves clustered to form superclusters? We will provide 3-dimensional maps of the distribution of galaxies.

Research Journal

Online courses, assignments, study groups.

All our class discussion will be on Ed Discussion on Canvas. We expect your posts to be professional and courteous to every member in the class.

Your Responsibilities in This Class

General policies.

IMAGES

Week 3
NPTEL: Python For Data Science Week 3 Assignment 3 Quiz Answers |NPTEL Python DS Course Assignment 3
NPTEL Python for Data Science Week 3 Quiz Assignment Solutions || August 2020 || Swayam
PYTHON FOR DATA SCIENCE
NPTEL Python for Data Science Assignment 3 Answers 2022
NPTEL: Programming ,Data Structures and Algorithms Using Python Week 3 Programming Assignment Answer

VIDEO

Assignment
NPTEL Python for Data Science Week 3 Quiz Assignment Solutions and Answers
NPTEL Data Analytics with Python Week3 Quiz Assignment Solutions
NPTEL Python for Data Science Week 3 Quiz Assignment Solutions
NPTEL Python for Data Science Week 3 Quiz answers with detailed proof of each answer
Python for Data Science Week 2 Assignment Answers

COMMENTS

Python for Data Science Week 3 Assignment 3 Solution
#pythonfordatascience #nptel #swayam #python #datascience Python for Data Science All week Assignment Solution - https://www.youtube.com/playlist?list=PL__28...
Introduction-to-Data-Science-in-Python-Week-3--Assignment-3/code at
Introduction to Data Science in Python Week 3-Assignment 3 - shikha7m/Introduction-to-Data-Science-in-Python-Week-3--Assignment-3
PYTHON FOR DATA SCIENCE
Hello guys, this is the solution for Week 3 of the course Python for Data Science from NPTEL/SWAYAM.-----...
Assignment 3 Solutions
NPTEL - PYTHON FOR DATA SCIENCE ASSIGNMENT - 3. Types of questions: MCQs - Multiple Choice Questions (a question has only one correct answer) MSQs - Multiple Select Questions (a question can have two, three or four correct options) In this case, equal weightage must be given to all options
NPTEL Python for Data Science Week 3 Assignment January 2024
Welcome to Week 3 of the NPTEL Python for Data Science course offered by IIT Madras in January 2024! In this assignment video, we will dive into various conc...
Introduction to Data Science in Python Assignment-3 · GitHub
It's Your solution file either prints output to the console (e.g. through the print () function, or it fails to run (e.g. throws an error). You must make sure that the py or ipynb file which you submit does not have errors. The output we received from the ipthon interpretor was: <IPython.core.display.HTML object>.
Python for Data Science
Week 3 Feedback Form: Python for Data Science Dear Learners, Thank you for continuing with the course and hope you are enjoying it. ... Python for Data Science : Assignment 3 is live now!! Dear Learners, The lecture videos for Week 3 have been uploaded for the course "Python for Data Science". The lectures can be accessed using the following link:
Introduction-to-Data-Science-in-python/Assignment+3 .ipynb at master
This repository contains Ipython notebooks of assignments and tutorials used in the course introduction to data science in python, part of Applied Data Science using Python Specialization from Univ...
Applied-Data-Science-with-Python---Coursera/Introduction to Data
This project contains all the assignment's solution of university of Michigan. - sapanz/Applied-Data-Science-with-Python---Coursera
tchagau/Introduction-to-Data-Science-in-Python
This repository includes course assignments of Introduction to Data Science in Python on coursera by university of michigan - tchagau/Introduction-to-Data-Science-in-Python
Introduction to data science in python Assignment_3 Coursera
Assignment3.py. Assignment 3 - More Pandas. This assignment requires more individual learning then the last one did - you are encouraged to check out the pandas documentation to find functions or methods you might not have used yet, or ask questions on Stack Overflow and tag them as pandas and python related.
Python for Data Science
The course aims at equipping participants to be able to use python programming for solving data science problems.INTENDED AUDIENCE : Final Year Undergraduate...
Python For Data Science
Python For Data Science : Assignment 3 is live now!! Dear Learners, The lecture videos for Week 3 have been uploaded for the course "Python For Data Science". The lectures can be accessed using the following link: ... Assignment-3 for Week-3 is also released and can be accessed from the following link.
Python for Data Science
Week 3 Feedback Form: Python for Data Science!! Dear Learners, Thank you for continuing with the course and hope you are enjoying it. ... Python for Data Science : Assignment 3 is live now!! Dear Learners, The lecture videos for Week 3 have been uploaded for the course "Python for Data Science". The lectures can be accessed using the following ...
Python For Data Science Assignment Solutions Week 3 2022
Answer: b. Sathi, Maths. Disclaimer: These answers are provided only for the purpose to help students to take references. This website does not claim any surety of 100% correct answers. So, this website urges you to complete your assignment yourself. Python for Data Science NPTEL Assignment Solutions Week 4.
python
Commented Nov 3, 2019 at 14:50 I don't have a PC right now so I can't check it but I suppose you have empty cell or NaN or ... in the excel - Natthaphon Hongcharoen
Python for Data Science
Reminder 3: Python for Data Science : Online Programming test (July 2022) Dear Learners, The criteria for ... Content and assignment for week 1 is already live, please check the course announcement page. Please use the discussion forums if you have any questions on this module.
NPTEL Python for Data Science Week3 Assignment
This video contains the solution for Week 3 Assignment NPTEL Python for Data Science course. This programming is explained in Google Colab.
python
Finally, load the Sciamgo Journal and Country Rank data for Energy Engineering and Power Technology from the file assets/scimagojr-3.xlsx, which ranks countries based on their journal contributions in the aforementioned area. Call this DataFrame ScimEn.
Coursera_Intro_to_Data_Science_with_Python/Week3/Assignment
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window.
PDF PPHA 30545: Machine Learning for Public Policy Dr. Christopher Clapp
Lab 02 (Python) - F 01:30pm-02:50pm Keller 1022 Lab 03 (Python) - F 03:00pm-04:20pm Keller 1022 Professor: Chris Clapp (he/him)[email protected] Ofﬁce Hours: TBD Keller 3039 or by appointment TAs: TBD Course Description It's an exciting time to study machine learning and data science more generally! We live in a digital era where
Python for Data Science Week 3: Assignment 3 Solutions || Jan 2023
Python for Data Science Week 3: Assignment 3 Solutions || Jan 2023 #nptel #pythondatascience
Python For Data Science
Week 1: BASICS OF PYTHON SPYDER (TOOL) ... by Gilbert Strang 2. Applied statistics and probability for engineers - by Douglas Montgomery 3. Mastering python for data science, Samir Madhavan. ... Average assignment score = 25% of average of best 3 assignments out of the total 4 assignments given in the course.
Computational Astrophysics (Summer 2024)
Assignments There will be programming assignments in Python and in data science using Jupyter notebooks. You will also have assignments in astronomy using the sources of data as mentioned above as well as other sources of online data. These assignments will have deadlines with late penalty for missed deadlines. Papers
Introduction-to-Data-Science-with-Python/Assignment+3.py at master
UMichigan's coursera Intro to DS with Python course - awongstory/Introduction-to-Data-Science-with-Python

Course Status :	Completed
Course Type :	Elective
Duration :	4 weeks
Category :
Credit Points :	1
Undergraduate
Start Date :	24 Jul 2023
End Date :	18 Aug 2023
Enrollment Ends :	07 Aug 2023
Exam Registration Ends :	21 Aug 2023
Exam Date :	24 Sep 2023 IST