Creating a Dataframe with Conditional Logic
In this article, we’ll explore how to create a dataframe in pandas that applies various conditional logic rules. We’ll start by understanding the basic concepts and then move on to more advanced techniques using boolean indexes.
Table of Contents
- Introduction
- Conditional Logic Rules
- Basic Approach with
inOperator - Short Circuiting Using Boolean Indexes
- Using the
isinFunction for Short Circuiting - Creating a Custom Function for Conditional Logic
Introduction
When working with dataframes in pandas, you often need to apply certain rules or conditions to the data. These rules can be based on various factors such as numerical values, string patterns, or even user-defined functions.
In this article, we’ll explore how to create a dataframe that applies conditional logic rules using different approaches. We’ll start with basic techniques and then move on to more advanced methods.
Conditional Logic Rules
Before diving into the code, let’s understand what kind of conditional logic rules we can apply:
- Numerical values: Compare numerical values in rows or columns.
- String patterns: Match string patterns using regular expressions.
- Boolean operations: Apply boolean operations like
and,or, andnot. - User-defined functions: Use custom functions to evaluate conditions.
Basic Approach with in Operator
One way to apply conditional logic rules is by using the in operator. This approach involves iterating over each value in the column or row and checking if it matches a specific pattern.
def f(x, DPD):
if DPD >= 1:
return x['NPA Status'] == 'N' and x['Scheme Type'] in ['CCA', 'ODA', 'LAA']
else:
return False
df['Invalid'] = df.apply(f, args=(10,), axis=1)
In this example, we define a function f that takes two arguments: x (the row) and DPD (the threshold value). The function checks if the NPA Status is ‘N’ and the Scheme Type is in the list ['CCA', 'ODA', 'LAA']. If DPD is greater than or equal to 1, it returns True, otherwise False.
We then apply this function to each row in the dataframe using the apply method.
Short Circuiting Using Boolean Indexes
Another approach is to use boolean indexes. This involves creating multiple boolean columns that evaluate different conditions and then applying logical operations between them.
df = pd.DataFrame({
'NPA Status': ['N', 'Y', 'N', 'N'],
'MSME Classifcation (Sub segment)': ['MICRO', 'MICRO', 'MICRO', 'MICRO'],
'Contact Number': ['', '6359434643', '6359434643', '6359434643'],
'Scheme Type': ['CCA', 'LAA', 'ODA', 'LAA']
})
c1 = df['NPA Status'].eq('N')
c2 = df['Scheme Type'].isin(['CCA', 'ODA', 'LAA'])
c3 = df['MSME Classifcation (Sub segment)'].isin(['MICRO', 'SMALL'])
df['Invalid'] = ~((c1 & c2) | (c3)) if 10 else False
In this example, we create three boolean columns c1, c2, and c3 that evaluate different conditions. We then use the logical OR operator | to combine these conditions. The ~ operator is used to negate the result.
If the threshold value 10 is provided, the expression (c1 & c2) | (c3) evaluates to True if any of the conditions are met, and otherwise it’s False. The ~ operator then negates this result, making it True if none of the conditions are met.
Using the isin Function for Short Circuiting
We can also use the isin function to evaluate boolean indexes. This is similar to using regular arrays or lists with the in operator.
c1 = df['NPA Status'].eq('N')
c2 = df['Scheme Type'].isin(['CCA', 'ODA', 'LAA'])
c3 = df['MSME Classifcation (Sub segment)'].isin(['MICRO', 'SMALL'])
df['Invalid'] = ~((c1 & c2) | (c3)) if 10 else False
In this example, we create three boolean columns c1, c2, and c3 that evaluate different conditions using the isin function.
Creating a Custom Function for Conditional Logic
We can also define a custom function to apply conditional logic rules. This involves defining a Python function that takes input parameters and returns a boolean value based on the conditions.
def fn(frame, DPD=None):
if not DPD:
return False
c1 = frame['NPA Status'].eq('N')
c2 = frame['Scheme Type'].isin(['CCA', 'ODA', 'LAA'])
c3 = frame['MSME Classifcation (Sub segment)'].isin(['MICRO', 'SMALL'])
c4 = frame['Contact Number'].str.match('^([0]|\+91)?[6789]\d{9}$')
return ~((c1 & c2) | (c3) | (c4))
df['Invalid'] = fn(df, DPD=10)
In this example, we define a custom function fn that takes two parameters: the dataframe frame and an optional threshold value DPD. The function checks if any of the conditions are met and returns a boolean value based on the result.
We then apply this function to each row in the dataframe using the apply method.
Conclusion
In this article, we explored how to create a dataframe that applies conditional logic rules. We started with basic techniques using the in operator and then moved on to more advanced methods like short circuiting using boolean indexes and custom functions.
By applying these approaches, you can efficiently evaluate complex conditions in pandas dataframes.
Last modified on 2023-05-28