Data mining in Python
Data mining
- Data importing
- Cleaning
- Transforming
- Visualization
- Summary statistics
Data mining using Numpy, Pandas, matplotlib
All codes in Jupyter notebook
Data munging
A.1.1
1. Selecting and retrieving data- Creating dataframe
- SLicing dataframe
- Data comparison
- Filtering with scalars
- Setting values with scalars
1.2
2. Handling missing data- Figuring out what data is missing
- Filling missing values with "0"
- Filling missing values with desired values
- Filling missing values with "front-fill method"
- Counting missing values
- Filtering out missing values
1.3
3. Removing duplicates- Obtaining duplicated values
DF_obj.duplicated()
- Removing duplciates
DF_obj.drop_duplicates(['column 3'])
1.4
4. Concatenating and transforming dataaxis=1
signifies column operation
- Two dataframes side-by-side
pd.concat([DF_obj, DF_obj_2], axis=1)
- Two dataframe one below the other
pd.concat([DF_obj, DF_obj_2])
- Dropping data
DF_obj.drop([0,2], axis=1)
- Create and name a series object -> add it as a new column to our dataframe
- This can be done in two ways; either by
df.join
or bydf.append
- Sorting values;
by="column_name
andoreder="ascending/desccending"
1.5
5. Grouping and data aggregation- Grouping data by column index
cars.groupby(cars['cyl'])
Data visualization
B.2.1
1. Creating standard plots- Required seaborn library
- Run the code as it is in python notebook
! pip install Seaborn
- Needed
import matplotlib.pyplot as plt
- Instant plot results
%matplotlib inline
- Set figure size
rcParams['figure.figsize']=5,4
- Set figure style
sb.set_style('whitegrid')
- Creating line chart
- Ploting a line chart from pandas object/ dataframe
- Creating bar charts
- Bar charts from dataframes
- Creating pie charts
- Saving plots (as images)
2.2
2. Scaling and subplotting- Scaling axes
ax.set_xlim([1,9])
andax.set_ylim([0,5])
- Subplotting charts:
fig = plt.figure()
fig, (ax1,ax2) = plt.subplots(1,2)
ax1.plot(x)
ax2.plot(x, y)
2.3
3. Histograms, pie-charts, overlapping line charts
1.Adjusting colours 2. Customizing line styles 3. Marker styles
2.4
4. Creating labels and annotations1. Functional method:
2. Object oriented annotation:
2.5
5. Time series- Sample and plot data
- Check out next repo for analysis
Comments
Post a Comment