Data structures in Python

Data structures in Python

About the Article

Computers store and process data with extraordinary speed and accuracy. So, it is highly essential that the data is stored efficiently and can be accessed fast. Also, the processing of data should happen in the smallest possible time, but without losing accuracy. Data structures deal with how the data is organized and held in the memory when a program processes it. It is important to note that, the data that is stored in the disk as part of persistent storage (like relational tables) are not referred to as data structure here. This article is designed for Computer Science graduates as well as Software Professionals who are willing to learn data structures in Python.

Linear Data Structures

These are the data structures that store the data elements in a sequential manner.

Array:

It is a sequential arrangement of data elements paired with the index of the data. Array is a container, which can hold a fix number of items and these items should be of the same type. Most of the data structures make use of arrays to implement the algorithms. The important terms to understand the concept of Array are as follows:

Element− Each item stored in an array is called an element. Index − Each location of an element in an array has a numerical index, which is used to identify the element.

Example

food = [fat, protein, vitamin]
print(food)

Linked List:

Each data element contains a link to another element, along with the data present in it. The list is a most versatile datatype available in Python, which can be written as a list of comma-separated values (items) between square brackets. An important thing about the list is that, items in a list need not be of the same type. Creating a list is as simple as putting different comma-separated values between square brackets.

For example:

list1 = ['physics', 'chemistry', 1997, 2000]
list2 = [1, 2, 3, 4, 5 ]
list3 = ["a", "b", "c", "d"]

Similar to string indices, list indices start at 0, and lists can be sliced, concatenated and so on.

Stack:

It is a data structure, which follows only a specific order of operation. LIFO (last in First Out) or FILO(First in Last Out). In the English dictionary, the word stack means arranging objects one over another. It is the same way; memory is allocated in this data structure. It stores the data elements in a similar fashion as a bunch of plates are stored one above another in the kitchen.

So, stack data structure allows operations at one end, which can be called the top of the stack. We can add elements or remove elements only from this end of the stack. In a stack, the element inserted last in sequence will come out first as we can remove it only from the top of the stack. Such a feature is known as the Last In First Out(LIFO) feature.

The operations of adding and removing the elements are known as PUSH and POP. In the following program, we implement it as add and remove functions. We declare an empty list and use the append() and pop() methods, to add and remove the data elements.

stack = []

# append() function to push
# element in the stack
stack.append('g')
stack.append('f')
stack.append('g')

print('Initial stack')
print(stack)

# pop() function to pop
# element from stack in
# LIFO order
print('\nElements popped from stack:')
print(stack.pop())
print(stack.pop())
print(stack.pop())

print('\nStack after elements are popped:')
print(stack)

# uncommenting print(stack.pop())
# will cause an IndexError
# as the stack is now empty


Output

Initial stack
['g', 'f', 'g']

Elements popped from stack:
g
f
g

Stack after elements are popped:
[]

Queue:

It is similar to Stack, but the order of operation is only FIFO (First In First Out). We are familiar with queues in our day-to-day life as we wait for service. The queue data structure also means the same, where the data elements are arranged in a queue.

The uniqueness of the queue lies in the way items are added and removed. The items are allowed at one end but removed from the other end. So, it is a First-in-First-out method.

A queue can be implemented using a python list, where we can use the insert() and pop() methods to add and remove elements. There is no insertion as data elements are always added at the end of the queue.

# Initializing a queue
queue = []

# Adding elements to the queue
queue.append('g')
queue.append('f')
queue.append('g')

print("Initial queue")
print(queue)

# Removing elements from the queue
print("\nElements dequeued from queue")
print(queue.pop(0))
print(queue.pop(0))
print(queue.pop(0))

print("\nQueue after removing elements")
print(queue)

# Uncommenting print(queue.pop(0))
# will raise and IndexError
# as the queue is now empty


outrput

Initial queue
['g', 'f', 'g']

Elements dequeued from queue
g
f
g

Queue after removing elements
[]

Matrix:

It is a two-dimensional data structure in which, the data element is referred to by a pair of indices. Matrix is a special case of a two-dimensional array, where, each data element is strictly the same size. So, every matrix is also a two-dimensional array but not, vice versa. Matrices are very important data structures for many mathematical and scientific calculations.

As we have already discussed, the two-dimensional array data structure in the previous chapter, we will be focusing on data structure operations specific to matrices in this chapter. We will also use the NumPy package for matrix data manipulation.

Non-Linear Data Structures

Apart from linear data structures, we equally have the non-linear data structure. These are the data structures in which, there is no sequential linking of data elements.

Any pair or group of data elements can be linked to each other and can be accessed without a strict sequence

Binary Tree:

It is a data structure, where each data element can be connected to maximum two other data elements and it starts with a root node. Tree represents the nodes connected by edges. It is a non-linear data structure. It has the following properties:

One node is marked as Root node.

  1. Every node other than the root is associated with one parent node.
  2. Each node can have an arbitrary number of chid nodes.

    # A Python class that represents an individual node
    # in a Binary Tree
    class Node:
    def __init__(self,key):
        self.left = None
        self.right = None
        self.val = key
    

    Heap:

It is a special case of Tree data structure, where the data in the parent node is either strictly greater than/ equal to the child nodes or strictly less than its child nodes.

# importing "heap" to implement heap 
import heap

# initializing list
li = [5, 7, 9, 1, 3]



# printing created heap
print ("The created heap is : ",end="")
print (list(li))


output

The created heap is : [1, 3, 9, 7, 5]

Hash Table:

It is a data structure, which is made of arrays associated with each other using a hash function. It retrieves values using keys rather than, an index from a data element. Also, Hash tables are a type of data structure, in which the address or the index value of the data element is generated from a hash function.

That makes accessing the data faster, as the index value behaves as a key for the data value. In other words, Hash table stores key-value pairs but the key is generated through a hashing function. So, the search and insertion function of a data element becomes much faster as the key values themselves become the index of the array which stores the data. In Python, the Dictionary data types represent the implementation of hash tables.

The Keys in the dictionary satisfy the following requirements.

The keys of the dictionary are hashable, i.e. they are generated by hashing function, which generates a unique result for each unique value supplied to the hash function. The order of data elements in a dictionary is not fixed.

Graph:

It is an arrangement of vertices and nodes, where some of the nodes are connected to each other through links.

# A simple representation of graph using Adjacency Matrix
class Graph:
    def __init__(self,numvertex):
        self.adjMatrix = [[-1]*numvertex for x in range(numvertex)]
        self.numvertex = numvertex
        self.vertices = {}
        self.verticeslist =[0]*numvertex

    def set_vertex(self,vtx,id):
        if 0<=vtx<=self.numvertex:
            self.vertices[id] = vtx
            self.verticeslist[vtx] = id

    def set_edge(self,frm,to,cost=0):
        frm = self.vertices[frm]
        to = self.vertices[to]
        self.adjMatrix[frm][to] = cost

        # for directed graph do not add this
        self.adjMatrix[to][frm] = cost

    def get_vertex(self):
        return self.verticeslist

    def get_edges(self):
        edges=[]
        for i in range (self.numvertex):
            for j in range (self.numvertex):
                if (self.adjMatrix[i][j]!=-1):
                    edges.append((self.verticeslist[i],self.verticeslist[j],self.adjMatrix[i][j]))
        return edges

    def get_matrix(self):
        return self.adjMatrix

G =Graph(6)
G.set_vertex(0,'a')
G.set_vertex(1,'b')
G.set_vertex(2,'c')
G.set_vertex(3,'d')
G.set_vertex(4,'e')
G.set_vertex(5,'f')
G.set_edge('a','e',10)
G.set_edge('a','c',20)
G.set_edge('c','b',30)
G.set_edge('b','e',40)
G.set_edge('e','d',50)
G.set_edge('f','e',60)

print("Vertices of Graph")
print(G.get_vertex())

print("Edges of Graph")
print(G.get_edges())

print("Adjacency Matrix of Graph")
print(G.get_matrix())


output

Vertices of Graph

[‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’]

Edges of Graph

[(‘a’, ‘c’, 20), (‘a’, ‘e’, 10), (‘b’, ‘c’, 30), (‘b’, ‘e’, 40), (‘c’, ‘a’, 20), (‘c’, ‘b’, 30), (‘d’, ‘e’, 50), (‘e’, ‘a’, 10), (‘e’, ‘b’, 40), (‘e’, ‘d’, 50), (‘e’, ‘f’, 60), (‘f’, ‘e’, 60)]

Adjacency Matrix of Graph

[[-1, -1, 20, -1, 10, -1], [-1, -1, 30, -1, 40, -1], [20, 30, -1, -1, -1, -1], [-1, -1, -1, -1, 50, -1], [10, 40, -1, 50, -1, 60], [-1, -1, -1, -1, 60, -1]]

Also, data structures are specific to Python language and they give greater flexibility in storing different types of data and faster processing in python environment.

List:

It is similar to an array with the exception, that the data elements can be of different data types. You can have both numeric and string data in a python list.

Example: Creating Python List

List = [1, 2, 3, "GFG", 2.3]
print(List)

Tuple:

Tuples are similar to lists, but they are immutable which means the values in a tuple cannot be modified they can only be read. A tuple is a sequence of immutable Python objects, just like lists. The differences between tuples and lists are, that the tuples cannot be changed unlike lists and tuples use parentheses, whereas lists use square brackets.

Creating a tuple is as simple as putting different comma-separated values. Optionally, you can put these comma-separated values between parentheses also.

For example:

tup1 = ('physics', 'chemistry', 1997, 2000);
tup2 = (1, 2, 3, 4, 5 );
tup3 = "a", "b", "c", "d
```";





 ## Dictionary:

The dictionary contains Key-value pairs as its data elements. In Dictionary each key is separated from its value by a colon (:), the items are separated by commas, and the whole thing is enclosed in curly braces. An empty dictionary without any items is written with just two curly braces, like this: {}.

Keys are unique within a dictionary while values may not be. The values of a dictionary can be of any type, but the keys must be of an immutable data type, such as strings, numbers, or tuples.

A simple example is as follows:

#!/usr/bin/python dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'} print "dict['Name']: ", dict['Name'] print "dict['Age']: ", dict['Age'] ```