Creating a Dataframe with Conditional Logic Using Boolean Indexes

Creating a Dataframe with Conditional Logic

In this article, we’ll explore how to create a dataframe in pandas that applies various conditional logic rules. We’ll start by understanding the basic concepts and then move on to more advanced techniques using boolean indexes.

Table of Contents

  1. Introduction
  2. Conditional Logic Rules
  3. Basic Approach with in Operator
  4. Short Circuiting Using Boolean Indexes
  5. Using the isin Function for Short Circuiting
  6. Creating a Custom Function for Conditional Logic

Introduction

When working with dataframes in pandas, you often need to apply certain rules or conditions to the data. These rules can be based on various factors such as numerical values, string patterns, or even user-defined functions.

In this article, we’ll explore how to create a dataframe that applies conditional logic rules using different approaches. We’ll start with basic techniques and then move on to more advanced methods.

Conditional Logic Rules

Before diving into the code, let’s understand what kind of conditional logic rules we can apply:

  • Numerical values: Compare numerical values in rows or columns.
  • String patterns: Match string patterns using regular expressions.
  • Boolean operations: Apply boolean operations like and, or, and not.
  • User-defined functions: Use custom functions to evaluate conditions.

Basic Approach with in Operator

One way to apply conditional logic rules is by using the in operator. This approach involves iterating over each value in the column or row and checking if it matches a specific pattern.

def f(x, DPD):
    if DPD >= 1:
        return x['NPA Status'] == 'N' and x['Scheme Type'] in ['CCA', 'ODA', 'LAA']
    else:
        return False

df['Invalid'] = df.apply(f, args=(10,), axis=1)

In this example, we define a function f that takes two arguments: x (the row) and DPD (the threshold value). The function checks if the NPA Status is ‘N’ and the Scheme Type is in the list ['CCA', 'ODA', 'LAA']. If DPD is greater than or equal to 1, it returns True, otherwise False.

We then apply this function to each row in the dataframe using the apply method.

Short Circuiting Using Boolean Indexes

Another approach is to use boolean indexes. This involves creating multiple boolean columns that evaluate different conditions and then applying logical operations between them.

df = pd.DataFrame({
    'NPA Status': ['N', 'Y', 'N', 'N'],
    'MSME Classifcation (Sub segment)': ['MICRO', 'MICRO', 'MICRO', 'MICRO'],
    'Contact Number': ['', '6359434643', '6359434643', '6359434643'],
    'Scheme Type': ['CCA', 'LAA', 'ODA', 'LAA']
})

c1 = df['NPA Status'].eq('N')
c2 = df['Scheme Type'].isin(['CCA', 'ODA', 'LAA'])
c3 = df['MSME Classifcation (Sub segment)'].isin(['MICRO', 'SMALL'])

df['Invalid'] = ~((c1 & c2) | (c3)) if 10 else False

In this example, we create three boolean columns c1, c2, and c3 that evaluate different conditions. We then use the logical OR operator | to combine these conditions. The ~ operator is used to negate the result.

If the threshold value 10 is provided, the expression (c1 & c2) | (c3) evaluates to True if any of the conditions are met, and otherwise it’s False. The ~ operator then negates this result, making it True if none of the conditions are met.

Using the isin Function for Short Circuiting

We can also use the isin function to evaluate boolean indexes. This is similar to using regular arrays or lists with the in operator.

c1 = df['NPA Status'].eq('N')
c2 = df['Scheme Type'].isin(['CCA', 'ODA', 'LAA'])
c3 = df['MSME Classifcation (Sub segment)'].isin(['MICRO', 'SMALL'])

df['Invalid'] = ~((c1 & c2) | (c3)) if 10 else False

In this example, we create three boolean columns c1, c2, and c3 that evaluate different conditions using the isin function.

Creating a Custom Function for Conditional Logic

We can also define a custom function to apply conditional logic rules. This involves defining a Python function that takes input parameters and returns a boolean value based on the conditions.

def fn(frame, DPD=None):
    if not DPD:
        return False
    c1 = frame['NPA Status'].eq('N')
    c2 = frame['Scheme Type'].isin(['CCA', 'ODA', 'LAA'])
    c3 = frame['MSME Classifcation (Sub segment)'].isin(['MICRO', 'SMALL'])
    c4 = frame['Contact Number'].str.match('^([0]|\+91)?[6789]\d{9}$')

    return ~((c1 & c2) | (c3) | (c4))

df['Invalid'] = fn(df, DPD=10)

In this example, we define a custom function fn that takes two parameters: the dataframe frame and an optional threshold value DPD. The function checks if any of the conditions are met and returns a boolean value based on the result.

We then apply this function to each row in the dataframe using the apply method.

Conclusion

In this article, we explored how to create a dataframe that applies conditional logic rules. We started with basic techniques using the in operator and then moved on to more advanced methods like short circuiting using boolean indexes and custom functions.

By applying these approaches, you can efficiently evaluate complex conditions in pandas dataframes.


Last modified on 2023-05-28