utils module¶
Utils for ML
-
utils.
about_dataframe
(df)¶ Describe DataFrame and show it’s information
Parameters: df (DataFrame) – Pandas DataFrame to describe and info
-
utils.
columns_info
(df, cat_count_threshold, show_group_counts=False)¶ Prints and returns column info for a given dataframe
Parameters: - df (DataFrame) – Pandas DataFrame
- cat_count_threshold (int) – If a column in the dataframe has unique value count less than this threshold then it will be tagged as ‘categorical’
- show_group_counts (boolean) – If True then prints the individual group counts for each column
Example
>>> object_cat_cols, >>> numeric_cat_cols, >>> numeric_cols = >>> utils.columns_info(data, >>> cat_count_threshold=5, >>> show_group_counts = True)
-
utils.
count_compare_plots
(df1, df1_title, df2, df2_title, column, **kwargs)¶ Show Count Plots of two DataFrames for comparision
Can be used to compare how Fill NA affects the distribution of a column
Args:
Example
The below example uses nhanes dataset.
>>> for each_column in object_cat_columns: >>> data[each_column] = data[each_column].fillna( >>> data.groupby(['Gender'])[each_column].ffill()) >>> for each_column in object_cat_columns: >>> str_count_of_nas = str(len( >>> data_raw.index[data_raw.isnull()[each_column]])) >>> str_count_of_nas = ' (Count of NAs:' + str_count_of_nas + ')' >>> utils.count_compare_plots(df1=data_raw, >>> df1_title='Before Fill-NA' + str_count_of_nas, >>> df2=data, >>> df2_title='After Fill-NA', >>> column=each_column, >>> height=4, >>> aspect=1.5, >>> hue_column='Diabetes', >>> split_plots_by='Gender')
-
utils.
count_plots
(df, columns, **kwargs)¶ Count Plots using seaborn
Display Count plots for the given columns in a DataFrame
Parameters: - df (DataFrame) – Pandas DataFrame
- columns (array-like) – Columns for which count plot has to be shown
- kwargs (array[str]) – Keyword Args
- KeywordArgs:
hue_column (str): Color split_plots_by (str): Split seaborn facetgrid by column such as Gender
height (float): Sets the height of plot
aspect (float): Determines the width of the plot based on height
Example
>>> utils.count_plots(data, object_cat_cols, height=4, aspect=1.5)
-
utils.
dist_plots
(df, columns, **kwargs)¶ Dist Plots using seaborn
Parameters: - df (DataFrame) – Pandas DataFrame.
- columns ([str]) – Plot only for selected columns.
- **kwargs – Keyword arguments.
Keyword Arguments: - hue_column (str) – Color
- split_plots_by (str) – Split seaborn facetgrid by column such as Gender
- height (float) – Sets the height of plot
- aspect (float) – Determines the width of the plot based on height
Example
>>> utils.dist_plots(data, numeric_cols, height=4, aspect=1.5, >>> hue_column='class', kde=False)
Returns: Nothing
-
utils.
do_cross_validate
(X, y, estimator_type, estimator, cv, **kwargs)¶ Cross Validate (sklearn)
Args:
Example
>>> cv_iterator = ShuffleSplit(n_splits=2, test_size=0.2, random_state=31) >>> cv_results = utils.do_cross_validate(X_train, >>> y_train, >>> 'Classification', >>> 'DecisionTreeClassifier', >>> cv=cv_iterator, >>> kernel='rbf', >>> C=1, >>> gamma=0.01)
-
utils.
do_feature_selection
(X, y, method, num_of_features=None)¶ Summary line.
Extended description of function.
Args:
-
utils.
do_outlier_detection
(df, target_column, outlier_classes, method, **kwargs)¶
-
utils.
do_scaling
(df, method, columns_to_scale=[])¶ Scale data using the specified method
Columns specified in the arguments will be scaled
Parameters: - df (DataFrame) – Pandas DataFrame
- columns (array-like) – List of columns that will be scaled
Returns: df (DataFrame)
-
utils.
encode_columns
(df, method, columns=[])¶ Summary line.
Extended description of function.
Args:
-
utils.
fill_null_values
(df, column, value, row_index)¶ Fill null values in a dataframe column
Parameters: - df (DataFrame) – Pandas DataFrame that will be updated
- column (str) – Column in the target dataframe that will be updated
- value – (Union[int, str, object]): New value that will replace null values
- row_index (Union[Index, array-like]) – Index of rows to be updated
-
utils.
get_X_and_y
(df, y_column)¶ Splits pd.dataframe into X (predictors) and y (response)
Parameters: - df (DataFrame) – Pandas DataFrame
- y_column (str) – The response column name
Returns: All columns except the response will be in X y (Series): Only the response column from dataframe
Return type: X (DataFrame)
-
utils.
get_dataframe_from_array
(data_array, columns)¶ Convert ndarray to pd.DataFrame for the given list of columns
Parameters: - data_array (ndarray) – Array to convert to pd.DataFrame
- columns (Union[array-like]) – Column Names for the pd.DataFrame
Returns: pd.DataFrame
-
utils.
kde_compare_plots
(df1, df1_title, df2, df2_title, column, **kwargs)¶ Summary line.
Extended description of function.
Args:
-
utils.
kde_plots
(df, columns, **kwargs)¶ KDE Plots using seaborn
Parameters: - df (DataFrame) – DataFrame
- columns ([str]) – Plot only for selected columns.
- **kwargs – Keyword arguments.
Keyword Arguments: - hue_column – for color coding
- split_plots_by – split seaborn FacetGrid by column, example: Gender
- height – sets the height of plot
- aspect – determines the widht of the plot based on height
Example
>>> utils.kde_plots(data, numeric_cols, height=4, aspect=1.5, >>> hue_column='class')
-
utils.
null_values_info
(df)¶ Show null value information of a DataFrame
Parameters: df (DataFrame) – Pandas DataFrame for which null values should be displayed
-
utils.
plot_decision_boundary
(x_axis_data, y_axis_data, response, estimator, x_axis_column=None, y_axis_column=None)¶ Plots the decision boundary
Args:
-
utils.
plot_roc_curve_binary_class
(y_true, y_pred)¶ Summary line.
Extended description of function.
Args:
-
utils.
plot_roc_curve_multiclass
(estimator, X_train, X_test, y_train, y_test, classes)¶ Summary line.
Extended description of function.
Args:
-
utils.
print_confusion_matrix
(y_true, y_pred)¶ Prints the confision matrix with columns and index labels
Parameters: - y_true (Union[ndarray, pd.Series]) – Actual Response
- y_pred (Union[ndarray, pd.Series]) – Predicted Response
-
utils.
print_func
(value_to_print, mode=None)¶ Display or Print an object or string
Parameters: - value_to_print (Union[str, object]) – Value to print
- mode (optional[str]) – Defaults to None. Accepts either DISPLAY or HTML
-
utils.
print_new_line
()¶ Prints a new line
-
utils.
print_separator
()¶ Prints a separator line using 80 underscores