Getting Started with NumPy

The Foundation of Data Science in Python

Vinod Baste
4 min readJun 16, 2024
Photo by rivage on Unsplash

Welcome to the exciting world of data science! If you’re venturing into this field, one of the first and most essential tools you’ll need to master is NumPy. In this blog, we’ll explore what NumPy is, why it’s so important, and how to perform some basic operations with it. By the end, you’ll have a solid foundation to build upon as you dive deeper into data science.

What is NumPy?

NumPy, short for Numerical Python, is a powerful library for numerical computations in Python. It provides support for creating and manipulating large, multi-dimensional arrays and matrices of numeric data. Almost every other data science library you’ll encounter, such as Pandas, Scikit-Learn, and Seaborn, is built on top of NumPy. This makes it an indispensable tool in the data scientist’s toolkit.

Why NumPy?

At first glance, NumPy arrays might seem similar to Python lists. However, they offer significant advantages:

  1. Efficiency: NumPy arrays are more efficient than Python lists, both in terms of speed and memory usage.
  2. Broadcasting: NumPy can apply operations to entire arrays without the need for explicit loops, making your code more concise and readable.
  3. Extensive Functionality: NumPy includes a vast array of functions for mathematical operations, random number generation, linear algebra, and more.

Goals for Learning NumPy

  1. Understand NumPy: Learn what it is and why it’s useful.
  2. Create Arrays: Explore various methods to create NumPy arrays.
  3. Retrieve Information: Learn slicing and indexing techniques to retrieve data from arrays.
  4. Basic Operations: Perform basic operations and use universal functions on NumPy arrays.

Getting Started with NumPy

Importing NumPy

Before you can use NumPy, you need to import it. By convention, it is imported as np:

import numpy as np

Creating Arrays

There are several ways to create NumPy arrays. Let’s explore a few common methods.

1. Transforming Python Lists

You can convert a Python list into a NumPy array using the np.array function:

my_list = [1, 2, 3]
my_array = np.array(my_list)
print(my_array) # Output: [1 2 3]

2. Using Built-in Functions

NumPy provides many built-in functions to create arrays directly.

  • np.arange: Similar to Python’s range but returns a NumPy array.
np.arange(0, 10, 2)  # Output: [0 2 4 6 8]
  • np.zeros: Creates an array filled with zeros.
np.zeros((2, 3))  # Output: [[0. 0. 0.][0. 0. 0.]]
  • np.ones: Creates an array filled with ones.
np.ones((3, 3))  # Output: [[1. 1. 1.][1. 1. 1.][1. 1. 1.]]
  • np.linspace: Generates a specified number of evenly spaced values between two points.
np.linspace(0, 10, 5)  # Output: [ 0.   2.5  5.   7.5 10. ]
  • np.eye: Creates an identity matrix.
np.eye(4)  # Output: [[1. 0. 0. 0.][0. 1. 0. 0.][0. 0. 1. 0.][0. 0. 0. 1.]]

3. Generating Random Data

NumPy’s random module can generate random data.

  • np.random.rand: Generates random numbers from a uniform distribution over [0, 1).
np.random.rand(3, 2)  # 3x2 array of random floats
  • np.random.randn: Generates samples from a standard normal distribution (mean = 0, variance = 1).
np.random.randn(3, 3)  # 3x3 array from standard normal distribution
  • np.random.randint: Generates random integers within a specified range.
np.random.randint(0, 100, (4, 5))  # 4x5 array of random integers between 0 and 100

Setting a Seed

Setting a seed allows you to reproduce random results, which is crucial for debugging and sharing your work. Use the np.random.seed function to set the seed:

np.random.seed(42)
np.random.rand(4) # Output: [0.37454012 0.95071431 0.73199394 0.59865848]

Working with NumPy Arrays

Once you have created a NumPy array, you can perform a variety of operations on it. Let’s look at some useful attributes and methods.

Attributes

  • Shape: Returns the dimensions of the array.
array.shape  # Output: (rows, columns)
  • Data Type: Returns the data type of the array elements.
array.dtype  # e.g., dtype('int32')

Methods

  • Reshape: Reshapes the array without changing its data.
array = np.arange(25).reshape(5, 5)
  • Max/Min: Returns the maximum or minimum value.
array.max()  # Maximum value 
array.min() # Minimum value
  • argmax/argmin: Returns the indices of the maximum or minimum value.
array.argmax()  # Index of maximum value 
array.argmin() # Index of minimum valuep

Conclusion

NumPy is a versatile and powerful library that forms the foundation for data science in Python. From creating arrays to performing complex mathematical operations, NumPy provides the tools needed for efficient and effective data manipulation. As you continue your journey into data science, mastering NumPy will unlock the capabilities of more advanced libraries and techniques.

For more in-depth information and advanced topics, be sure to check out the NumPy documentation.

You can find the source code here.

Stay connected as we dive deeper into more advanced NumPy techniques in the next part of this series. We’ll explore topics such as broadcasting, advanced indexing, and universal functions, and discuss how NumPy integrates with other libraries like Pandas and Matplotlib.

Happy coding, and welcome to the world of data science!

--

--