Data mining in Python
Data mining
- Data importing
- Cleaning
- Transforming
- Visualization
- Summary statistics
Data mining using Numpy, Pandas, matplotlib
All codes in Jupyter notebook
A. Data munging
1. Selecting and retrieving data 1.1
- Creating dataframe
- SLicing dataframe
- Data comparison
- Filtering with scalars
- Setting values with scalars
2. Handling missing data 1.2
- Figuring out what data is missing
- Filling missing values with "0"
- Filling missing values with desired values
- Filling missing values with "front-fill method"
- Counting missing values
- Filtering out missing values
3. Removing duplicates 1.3
- Obtaining duplicated values
DF_obj.duplicated() - Removing duplciates
DF_obj.drop_duplicates(['column 3'])
4. Concatenating and transforming data 1.4
axis=1signifies column operation
- Two dataframes side-by-side
pd.concat([DF_obj, DF_obj_2], axis=1) - Two dataframe one below the other
pd.concat([DF_obj, DF_obj_2]) - Dropping data
DF_obj.drop([0,2], axis=1) - Create and name a series object -> add it as a new column to our dataframe
- This can be done in two ways; either by
df.joinor bydf.append - Sorting values;
by="column_nameandoreder="ascending/desccending"
5. Grouping and data aggregation 1.5
- Grouping data by column index
cars.groupby(cars['cyl'])
B. Data visualization
1. Creating standard plots 2.1
- Required seaborn library
- Run the code as it is in python notebook
! pip install Seaborn - Needed
import matplotlib.pyplot as plt - Instant plot results
%matplotlib inline - Set figure size
rcParams['figure.figsize']=5,4 - Set figure style
sb.set_style('whitegrid') - Creating line chart
- Ploting a line chart from pandas object/ dataframe
- Creating bar charts
- Bar charts from dataframes
- Creating pie charts
- Saving plots (as images)
2. Scaling and subplotting 2.2
- Scaling axes
ax.set_xlim([1,9])andax.set_ylim([0,5]) - Subplotting charts:
fig = plt.figure()
fig, (ax1,ax2) = plt.subplots(1,2)
ax1.plot(x)
ax2.plot(x, y)
3. Histograms, pie-charts, overlapping line charts 2.3
1.Adjusting colours 2. Customizing line styles 3. Marker styles
4. Creating labels and annotations 2.4
1. Functional method:
2. Object oriented annotation:
5. Time series 2.5
- Sample and plot data
- Check out next repo for analysis






Comments
Post a Comment