Here are the functions in the pandas’ module or library:
import pandas as pd
DATA FRAME
data = {'Country': ['Belgium', 'India', 'Brazil'],
'Capital': ['Brussels', 'New Delhi', 'Brasilia'],
'Population': [11190846, 1303171035, 207847528]}
df = pd.DataFrame(data,columns=['Country', 'Capital', 'Population'])
READ AND WRITE CSV
pd.read_csv('file.csv', header=None, nrows=5)
df.to_csv('myDataFrame.csv')
Read multiple sheets from the same file
xlsx = pd.ExcelFile('file.xls')
df = pd.read_excel(xlsx, 'Sheet1')
Read and Write to Excel
pd.read_excel('file.xlsx')
df.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1')
Read and Write to SQL Query or Database Table
(read_sql()is a convenience wrapper around read_sql_table() and read_sql_query())
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
pd.read_sql(SELECT * FROM my_table;, engine)
pd.read_sql_table('my_table', engine)
pd.read_sql_query(SELECT * FROM my_table;', engine)
df.to_sql('myDf', engine)
Selection
Getting
Get one element
s['b']
-5
Get subset of a DataFrame
df[1:]
Country Capital Population
1 India New Delhi 1303171035
2 Brazil Brasilia 207847528
Selecting', Boolean Indexing and Setting
By Position
Select single value by row and and column
df.iloc([0], [0])
'Belgium'
df.iat([0], [0])
'Belgium'
By Label
Select single value by row and column labels
df.loc([0], ['Country'])
'Belgium'
df.at([0], ['Country'])
'Belgium'
By Label/Position
Select single row of subset of rows
df.ix[2]
Country Brazil
Capital Brasilia
Population 207847528
Select a single column of subset of columns
df.ix[:, 'Capital']
0 Brussels
1 New Delhi
2 Brasilia
Select rows and columns
df.ix[1, 'Capital']
'New Delhi'
Boolean Indexing
Series s where value is not >1
s[~(s > 1)]
s where value is <-1 or >2
s[(s < -1) | (s > 2)]
Use filter to adjust DataFrame
df[df['Population']>1200000000]
POWERED BY DATACAMP WORKSPACE
Set index a of Series s to 6
s['a'] = 6
Dropping
Drop values from rows (axis=0)
s.drop(['a', 'c'])
Drop values from columns(axis=1)
df.drop('Country', axis=1)
Sort and Rank
Sort by labels along an axis
df.sort_index()
Sort by the values along an axis
df.sort_values(by='Country')
Assign ranks to entries
df.rank()
Retrieving Series/DataFrame Information
Basic Information
(rows, columns)
df.shape
Describe index
df.index
Describe DataFrame columns
df.columns
Info on DataFrame
df.info()
Number of non-NA values
df.count()
Summary
Sum of values
df.sum()
Cumulative sum of values
df.cumsum()
Minimum/maximum values
df.min()/df.max()
Minimum/Maximum index value
df.idxmin()/df.idxmax()
Summary statistics
df.describe()
Mean of values
df.mean()
Median of values
df.median()
Applying Functions
f = lambda x: x*2
Apply function
df.apply(f)
Apply function element-wise
df.applynap(f)
Internal Data Alignment
NA values are introduced in the indices that don't overlap:
s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd'])
s + s3
a 10.0
b NaN
c 5.0
d 7.0
Arithmetic Operations with Fill Methods
You can also do the internal data alignment yourself with the help of the fill methods:
s.add(s3, fill_value=0)
a 10.0
b -5.0
c 5.0
d 7.0
s.sub(s3, fill_value=2)
s.div(s3, fill_value=4)
s.mul(s3, fill_value=3)
https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
SOURCE: DATACAMP, PANDAS.PYDATA.ORG
-----------------------------------------------------------------------
immadshahid.com