Understanding Python 3.7 Dataclasses


Photo by Skitterphoto from Pexels

Python dataclasses

Python dataclasses is a built-in module which provides a decorator and functions for automatically adding generated special methods such as __init__() and __repr__() to user-defined classes.

dataclasses is supported in Python version 3.7 and above.

Importing dataclasses module:

from dataclasses import dataclass

dataclasses is a module which contains dataclass. dataclass is a decorator function for classes.

dataclass parameters.

@dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)

By default, dataclass provides these three methods.

default dataclass parameters:

  • __init__() method for initializing objects,
  • _repr__() for object representation
  • __eq__() method to do equality operations on class objects.
from dataclasses import dataclass

@dataclass
class Student:
firstname:str
lastname:str
rollno:int
grade:str

#Instantiating objects __init__() method does this
s1=Student("karthi","Palani",12,"third")
s2=Student("Sarvesh","Palani",15,"first")

#__repr__() method does this
print (s1)
#Output:Student(firstname='karthi', lastname='Palani', rollno=12, grade='third')
print (s2)
#Output:Student(firstname='Sarvesh', lastname='Palani', rollno=15, grade='first')

#__eq__() method does this
print (s1==s2) #Output:False
print(s1!=s2) #Output:True

In dataclass, no need to specify __init__() ,__repr__() ,__eq__() special methods. dataclass decorator automatically generate these special methods to the user defined class.

class attributes names are defined using type annotations like

firstname:str
lastname:str
rollno:int
grade:str

In normal classes, we have to mention the __init__() ,__repr__() and __eq__() method.

class Student:
    def __init__(self, firstname, lastname, rollno, grade):
        self.firstname = firstname
        self.lastname = lastname
        self.rollno = rollno
        self.grade = grade

    def __repr__(self):
        return f"{self.firstname}-{self.lastname}-{self.rollno}-{self.grade}"

    def __eq__(self, other):
        return (self.firstname, self.lastname, self.rollno, self.grade) == (
        other.firstname, other.lastname, other.rollno, other.grade)


# Instantiating Student Objects
s1 = Student("karthi", "Palani", 12, "third")
s2 = Student("Sarvesh", "Palani", 15, "first")

# objects representation is defined in __repr__() method.
print(s1)
# Output:karthi-Palani-12-third
print(s2)
# Output:Sarvesh-Palani-15-first


# performing equality operations by __eq__() method.
print(s1 == s2)

Paramaterized dataclasses:

@dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)
  • order:If true (the default is False),__lt__(), __le__(), __gt__(), and __ge__() methods will be generated.
  • unsafe_hash:If true(default is False),__hash__() method will be generated
  • frozen:If frozen is set to True,attributes of class objects are immutable. It can’t be modified.Default is False.

In the below example, have defined parameterized dataclasses.

  • order is set to True- Comparison operations(<,>,≤,≥ )can be performed on objects
from dataclasses import dataclass,field

@dataclass(order=True)
class Student:
firstname:str
lastname:str
grade:str
rollno:int

s1=Student("karthi","Palani","third",12)
s2=Student("Sarvesh","Palani","third",12)

#Performing comparision operations.
print (s1>s2) #Output:True
print (s1>=s2) #Output:True
print (s1<=s2)#Output:False
print (s1<s2)#Output:False
  • frozen is set to True- attributes values can’t be changed.
  • unsafe_hash is set to True

Usually, hash() function is used to calculate the hash value of immutable data types. But in some cases, have to find the hash value of mutable data types. It can be done by setting unsafe_hash=True. By default, unsafe_hash is False, so if we attempt to find the hash value of mutable attributes, it will raise an error.

from dataclasses import dataclass

@dataclass(unsafe_hash=True)
class Student:
firstname:str
lastname:str
rollno:int
grade:str

#instantiating objects __init__() method does this
s1=Student("karthi","Palani",12,"third")
s2=Student("Sarvesh","Palani",15,"first")

#performing hash operations on mutable attributes by setting unsafe_hash=True
print (hash(s1)) #Output:-1138310786
#modiying attribute values
s1.firstname="Indhu"
#since attribute values are changed,hash value is changed.
print (hash(s1)) #Output:-124687988

field():

dataclasses has field() function.It allows to give additional per field information.

Importing field function:

from dataclasses import dataclass,field

dataclasses.field(*, default=MISSING, default_factory=MISSING, repr=True, hash=None, init=True, compare=True, metadata=None)

As shown above, the MISSING value is a sentinel object used to detect if the default and default_factory parameters are provided. This sentinel is used because None is a valid value for default. No code should directly use the MISSING value.

The parameters to field() are:

  • default
  • default_factory
  • init
  • repr
  • hash
  • compare
  • metadata
  1. default : default parameter in field() is used to specify default values for this field. Default attributes should follow non-default attributes. (like function with default parameters)

rollno:int=field(default=1)

or

rollno:int=1

Here, we are mentioning the default value of rollno as 1

from dataclasses import dataclass, field


@dataclass
class Student:
firstname: str
lastname: str
grade: str
rollno: int = field(default=1)


# Instantiating objects
# rollno is not given.it will take defualt value 1
s1 = Student("karthi", "Palani", "third")
print(s1)
# Student(firstname='karthi', lastname='Palani', grade='third', rollno=1)


# rollno is given.
s2 = Student("Sarvesh", "Palani", "first", 15)
print(s2)
# Student(firstname='Sarvesh', lastname='Palani', grade='first', rollno=15)

2.default_factory

default_factory — accepts function. Return value of that function will be the default value of that attribute.

rollno:int=field(default_factory=get_rollno)

from dataclasses import dataclass,field

def get_rollno():
return 12


@dataclass(order=True)
class Student:
firstname:str
lastname:str
grade:str
rollno:int=field(default_factory=get_rollno)

#If rollno is not mentioned, it will take the defualt value 12
s1=Student("karthi","Palani","third")
s2=Student("Sarvesh","Palani","third",15)


print (s1)
#Output:Student(firstname='karthi', lastname='Palani', grade='third', rollno=12)

print (s2)
#Output:Student(firstname='Sarvesh', lastname='Palani', grade='third', rollno=15)

3.init

By default set to True. If init is set to False means, no need to include this field as a parameter while instantiating an object.

from dataclasses import dataclass, field


@dataclass
class Student:
firstname: str
lastname: str
grade: str
rollno: int = field(init=False, default=5)


# Instantiating objects
s1 = Student("karthi", "Palani", "third")
print(s1)
# Output:Student(firstname='karthi', lastname='Palani', grade='third', rollno=5)

4.repr

By default, repr is set to True. That means object representation format includes this field. If we want to exclude this field from object representation means, we can set repr=False

from dataclasses import dataclass, field


@dataclass
class Student:
firstname: str
lastname: str = field(repr=False)
grade: str = field(repr=False)
rollno: int


# Instantiating objects

s1 = Student("karthi", "Palani", "third", 12)
# only firstname and rollno will be displayed in object representation
print(s1)
# Output:Student(firstname='karthi', rollno=12)


s2 = Student("Sarvesh", "Palani", "first", 15)
# Output:Student(firstname='Sarvesh', rollno=15)

5.hash

It can have a bool or None value. If we set hash=True, this field is included in the hash function. Usually, hash function is used when comparing objects. If it is set to None, the value of the compare parameter is used. Default is None.

rollno:int=field(hash=True)

6.compare

By default, compare is set to True. That means this field is included in comparison and equality operations. If set to False, this field is excluded from comparison and equality operation.

from dataclasses import dataclass, field


@dataclass(order=True)
class Student:
firstname: str = field(compare=False)
lastname: str
rollno: int
grade: str


# Instantiating object.
s1 = Student("karthi", "Palani", 12, "third")
s2 = Student("Sarvesh", "Palani", 12, "third")

# Returns True, because firstname is not part of comparison.since compare is set to False
print(s1 == s2)
# Output:True
print(s1 >= s2) # Output:True

7.metadata

It is actually a dictionary(key-value pair).metadata is not used by class objects. But it is important if dataclass is being used or accessed by some third-party applications. It gives some information about this field.

from dataclasses import dataclass,field

@dataclass
class Student:
firstname:str
lastname:str
rollno:int=field(metadata={'student':'register number'})
grade:str=field(default="third")


s1=Student("karthi","Palani",12)
s2=Student("Sarvesh","Palani",12)

#metadata information is retreived
print (s1.__dataclass_fields__['rollno'])
#Output: Field(name='rollno',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x0357D3E8>,default_factory=<dataclasses._MISSING_TYPE object at 0x0357D3E8>,
init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({'student': 'register number'}),_field_type=_FIELD)

post_init processing:

While instantiating object, __init__() method is called. After __init__() method is processed, if __post_init__() is defined in the class, it is called automatically.

It is used for initializing field values that depend on one or more fields.

If we want to initialize an attribute based on other attributes in that class means, we can define that logic in __post_init__() method.

For example: In Student dataclass, if we want to add fullname attribute, it can be created by concatenating firstname and lastname. We can define this in __post_init__() method.

from dataclasses import dataclass, field

@dataclass(order=True)
class Student:
firstname: str
lastname: str
rollno: int
grade: str
def __post_init__(self):
self.fullname=self.firstname+" " +self.lastname

# Instantiating object.
s1 = Student("karthi", "Palani", 12, "third")
s2 = Student("Sarvesh", "Palani", 12, "third")

print (s1.fullname)#Output:karthi Palani
print (s2.fullname)#Output:Sarvesh Palani

asdict(),astuple():

asdict()

Converts dataclass instance into a dictionary. Each dataclass is converted to a dict of its fields(name-value pair)

astuple()

converts dataclass instance into a tuple. Each dataclass is converted to a tuple of its field values.

We have to import asdict and astuple from dataclasses module

from dataclasses import dataclass,field,asdict,astuple

is_dataclass()

dataclass.is_dataclass(class_or_instance)

Returns True if its parameter is a dataclass or dataclass object. Otherwise returns False.

from dataclasses import dataclass,field,asdict,astuple,is_dataclass

@dataclass
class Student:
firstname:str
lastname:str
rollno:int
grade:str
fullname:str=field(init=False)
def __post_init__(self):
self.fullname=self.firstname+" " + self.lastname


#Instantiating object.
s1=Student("karthi","Palani",12,"third")

print (s1)
#Output:Student(firstname='karthi', lastname='Palani', rollno=12, grade='third', fullname='karthi Palani')


#s1(dataclass instance) is converted into dict.
print (asdict(s1))
#Output:{'firstname': 'karthi', 'lastname': 'Palani', 'rollno': 12, 'grade': 'third', 'fullname': 'karthi Palani'}


#s1 is converted into tuple.
print (astuple(s1))
#Output: ('karthi', 'Palani', 12, 'third', 'karthi Palani')

#checks whether s1 is dataclass/dataclass object
print (is_dataclass(s1)) #Output:True
#checks whether Student is dataclass/dataclass object
print (is_dataclass(Student))#Output: True

Conclusion:

  • If we specify default arguments, that attribute should be the last one. Otherwise, it will raise TypeError.
  • If frozen is set to True, we can perform a hash function on the attributes. Because frozen attributes will be immutable.
  • Below mentioned three dataclass declarations are equivalent.
@dataclass
class Student
@dataclass()
class Student
@dataclass(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)
class Student

Resources(Python docs):

dataclass 

Variable annotations

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s