21 Data Science Hacks: A Cheat Sheet for Data Science Beginners

Garima Nishad
3 min readJul 29, 2022

Few commands/codes that I use as a Data Scientist almost every day

  • Crop a video from a specific start time to an end time.
ffmpeg -i <input video path> -ss 00:00:03 -t 00:00:08 -async 1 < output video path>
  • Sort files in a folder in a human way, i.e. 0,1,2,3 …n.
def tryint(s):
try:
return int(s)
except:
return s
def alphanum_key(s):
“”” Turn a string into a list of string and number chunks.
“z23a” -> [“z”, 23, “a”]
“””
return [ tryint(c) for c in re.split(‘([0–9]+)’, s) ]
def sort_nicely(l):
“”” Sort the given list in the way that humans expect.
“””
l.sort(key=alphanum_key)

sort_nicely(filename)
  • Find the number of files in a folder from the command line.
ls -1 | wc -l
  • Show image in a jupyter notebook using matplotlib.
import cv2
import matplotlib.pyplot as plt
im = cv2.imread('image/im.png')
plt.imshow(im)
  • Rotate a video.
from moviepy.editor import *

# loading video gfg
clip = VideoFileClip("/input.mp4")

# rotating clip by 180 degree
clip = clip.rotate(270)

# saving the clip
clip.write_videofile("/output.mp4")
  • Compare two dataframe columns.
import pandas as pd
df = pd.read_csv('/home/garima/Desktop/demo - Sheet1.csv')
count = 0
for i in range(len(df)):
if df['Actual'][i] == df['Predictions'][i]:
count +=1
else:
print(df['FileName'][i])
print(count)
  • Get the list of all files in the subfolder.
import os
all_files = os.listdir('/images/')
directory = "/images/"
arr = []
direc = []
for root, subdirectories, files in os.walk(directory):
for subdirectory in subdirectories:
direc.append(os.path.join(root, subdirectory))
for file in files:
arr.append(os.path.join(root, file))

print(len(arr), len(direc))
  • Rotate all files in a folder.
from scipy import ndimage, misc
import numpy as np
import os
import cv2
import matplotlib.pyplot as plt
import imageio
outPath = "./frame"
path = "./frame"
# iterate through the names of contents of the folder
for image_path in os.listdir(path):
# create the full input path and read the file
input_path = os.path.join(path, image_path)
image_to_rotate = imageio.imread(input_path)
# rotate the image
rotated = ndimage.rotate(image_to_rotate, 270)
# create full output path, 'example.jpg'
# becomes 'rotate_example.jpg', save the file to disk
fullpath = os.path.join(outPath, image_path)
imageio.imsave(fullpath, rotated)
  • Check if a string represents an int.
def isint(s):
try:
int(s)
return True
except ValueError:
return False

isint("99")
  • Merge multiple pdf files.
from PyPDF2 import PdfMerger

pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf', 'file4.pdf']

merger = PdfMerger()

for pdf in pdfs:
merger.append(pdf)

merger.write("result.pdf")
merger.close()
  • Extracting and Saving Video Frames.
import cv2
vidcap = cv2.VideoCapture('input.mp4')
success,image = vidcap.read()
count = 0
while success:
cv2.imwrite("frame%d.jpg" % count, image)
success,image = vidcap.read()
print('Read a new frame: ', success)
count += 1
  • Create a new Conda environment. For all conda-related commands -> this has been my holy grail!
conda create — name py35 python=3.5
  • Remove a Conda environment.
conda env remove -n ENV_NAME
  • Stop a jupyter notebook port.
jupyter notebook stop 8888
  • Read an image from a URL.
from PIL import Image
import urllib.request

URL = 'url.jpg'

with urllib.request.urlopen(URL) as url:
with open('temp.jpg', 'wb') as f:
f.write(url.read())

img = Image.open('temp.jpg')

img.show()
  • Crop an image if x y w h is given.
crop = im[y:y+h,x:x+w]
  • Create a CSV file using values from a Numpy array.
import numpy
a = numpy.asarray([ [2,2,9], [1,5,6], [0,8,9],[1,6,3] ])
numpy.savetxt("test.csv", a, delimiter=",")
  • Turn image binary.
import cv2
import matplotlib.pyplot as plt
img = cv2.imread(‘test.jpg’, 2)ret, bw_img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)plt.imshow(bw_img, cmap='gray')
  • Image dilation and erosion.
import cv2
import numpy as np
import matplotlib.pyplot as plt
img = cv2.imread(‘input.png’, 0)kernel = np.ones((5,5), np.uint8)
img_erosion = cv2.erode(img, kernel, iterations=1)
img_dilation = cv2.dilate(img, kernel, iterations=1)
plt.imshow(img_erosion)
plt.imshow(img_dilation)
  • Invert black to white and vice versa in an image.
numpy.invert(close_img)
  • Increase/ Decrease the brightness of an image.

im = Image.open("images.png")
image = ImageEnhance.Brightness(im)level = 5.0image.enhance(level).show()

Conclusion

I hope this miscellaneous combination of code helps you in times when the GOAT StackOverflow gives vague question-answer pair suggestions. This little list has helped me whenever I needed to do a specific task as an ML beginner. I thought this might help a fellow learner; hence decided to share it.

I will keep adding more codes as I keep learning and exploring more!

Till then, Happy coding!

Photo by Christopher Gower on Unsplash

--

--

Garima Nishad

A Machine Learning Research scholar who loves to moonlight as a blogger.