Data mining in Python

Data mining

  1. Data importing
  2. Cleaning
  3. Transforming
  4. Visualization
  5. Summary statistics

Data mining using Numpy, Pandas, matplotlib

All codes in Jupyter notebook

A. Data munging

1. Selecting and retrieving data 1.1

  1. Creating dataframe
  2. SLicing dataframe
  3. Data comparison
  4. Filtering with scalars
  5. Setting values with scalars

2. Handling missing data 1.2

  1. Figuring out what data is missing
  2. Filling missing values with "0"
  3. Filling missing values with desired values
  4. Filling missing values with "front-fill method"
  5. Counting missing values
  6. Filtering out missing values

3. Removing duplicates 1.3

  1. Obtaining duplicated values DF_obj.duplicated()
  2. Removing duplciates DF_obj.drop_duplicates(['column 3'])

4. Concatenating and transforming data 1.4

  • axis=1 signifies column operation
  1. Two dataframes side-by-side pd.concat([DF_obj, DF_obj_2], axis=1)
  2. Two dataframe one below the other pd.concat([DF_obj, DF_obj_2])
  3. Dropping data DF_obj.drop([0,2], axis=1)
  4. Create and name a series object -> add it as a new column to our dataframe
  5. This can be done in two ways; either by df.join or by df.append
  6. Sorting values; by="column_name and oreder="ascending/desccending"

5. Grouping and data aggregation 1.5

  1. Grouping data by column index cars.groupby(cars['cyl'])

B. Data visualization

1. Creating standard plots 2.1

  1. Required seaborn library
  2. Run the code as it is in python notebook ! pip install Seaborn
  3. Needed import matplotlib.pyplot as plt
  4. Instant plot results %matplotlib inline
  5. Set figure size rcParams['figure.figsize']=5,4
  6. Set figure style sb.set_style('whitegrid')
  7. Creating line chart
  8. Ploting a line chart from pandas object/ dataframe
  9. Creating bar charts
  10. Bar charts from dataframes
  11. Creating pie charts
  12. Saving plots (as images)

2. Scaling and subplotting 2.2

  1. Scaling axes ax.set_xlim([1,9]) and ax.set_ylim([0,5])
  2. Subplotting charts:
fig = plt.figure()
fig, (ax1,ax2) = plt.subplots(1,2)
ax1.plot(x)
ax2.plot(x, y)

3. Histograms, pie-charts, overlapping line charts 2.3

1.Adjusting colours 2. Customizing line styles 3. Marker styles

4. Creating labels and annotations 2.4

1. Functional method:
alt text
2. Object oriented annotation:
alt text

5. Time series 2.5

  1. Sample and plot data
  2. Check out next repo for analysis

6. Density, boxplots, summary statistics 2.6


  1. Density plot sb.distplot(mpg)
  2. Scatter plot
  3. Regression plot
  4. Pairplot
  5. Pair plot with column colouring
  6. Boxplots cars.boxplot(column='mpg', by='am') alt text

Comments

Popular Posts