21 Data Science Hacks: A Cheat Sheet for Data Science Beginners

3 min readJul 29, 2022

Few commands/codes that I use as a Data Scientist almost every day

Crop a video from a specific start time to an end time.

ffmpeg -i <input video path> -ss 00:00:03 -t 00:00:08 -async 1 < output video path>

Sort files in a folder in a human way, i.e. 0,1,2,3 …n.

def tryint(s):
 try:
 return int(s)
 except:
 return sdef alphanum_key(s):
 “”” Turn a string into a list of string and number chunks.
 “z23a” -> [“z”, 23, “a”]
 “””
 return [ tryint(c) for c in re.split(‘([0–9]+)’, s) ]def sort_nicely(l):
 “”” Sort the given list in the way that humans expect.
 “””
 l.sort(key=alphanum_key)
 
sort_nicely(filename)

Find the number of files in a folder from the command line.

ls -1 | wc -l

Show image in a jupyter notebook using matplotlib.

import cv2
import matplotlib.pyplot as plt
im = cv2.imread('image/im.png')
plt.imshow(im)

Rotate a video.

from moviepy.editor import *
  
# loading video gfg 
clip = VideoFileClip("/input.mp4") 
    
# rotating clip by 180 degree 
clip = clip.rotate(270) 

# saving the clip 
clip.write_videofile("/output.mp4")

Compare two dataframe columns.

import pandas as pd
df = pd.read_csv('/home/garima/Desktop/demo - Sheet1.csv')
count = 0
for i in range(len(df)):
    if df['Actual'][i] == df['Predictions'][i]:
        count +=1
    else:
        print(df['FileName'][i])
print(count)

Get the list of all files in the subfolder.

import os
all_files = os.listdir('/images/')
directory = "/images/"
arr = []
direc = []
for root, subdirectories, files in os.walk(directory):
    for subdirectory in subdirectories:
        direc.append(os.path.join(root, subdirectory))
    for file in files:
            arr.append(os.path.join(root, file))

print(len(arr), len(direc))

Rotate all files in a folder.

from scipy import ndimage, misc
import numpy as np
import os
import cv2
import matplotlib.pyplot as plt
import imageio
outPath = "./frame"
path = "./frame"
# iterate through the names of contents of the folder
for image_path in os.listdir(path):
   # create the full input path and read the file
   input_path = os.path.join(path, image_path)
   image_to_rotate = imageio.imread(input_path)
   # rotate the image
   rotated = ndimage.rotate(image_to_rotate, 270)
   # create full output path, 'example.jpg' 
   # becomes 'rotate_example.jpg', save the file to disk
   fullpath = os.path.join(outPath, image_path)
   imageio.imsave(fullpath, rotated)

Check if a string represents an int.

def isint(s):
    try: 
        int(s)
        return True
    except ValueError:
        return False

isint("99")

Merge multiple pdf files.

from PyPDF2 import PdfMerger

pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf', 'file4.pdf']

merger = PdfMerger()

for pdf in pdfs:
    merger.append(pdf)

merger.write("result.pdf")
merger.close()

Extracting and Saving Video Frames.

import cv2
vidcap = cv2.VideoCapture('input.mp4')
success,image = vidcap.read()
count = 0
while success:
  cv2.imwrite("frame%d.jpg" % count, image)     
  success,image = vidcap.read()
  print('Read a new frame: ', success)
  count += 1

Create a new Conda environment. For all conda-related commands -> this has been my holy grail!

conda create — name py35 python=3.5

Remove a Conda environment.

conda env remove -n ENV_NAME

Stop a jupyter notebook port.

jupyter notebook stop 8888

Read an image from a URL.

from PIL import Image
import urllib.request

URL = 'url.jpg'

with urllib.request.urlopen(URL) as url:
    with open('temp.jpg', 'wb') as f:
        f.write(url.read())

img = Image.open('temp.jpg')

img.show()

Crop an image if x y w h is given.

crop = im[y:y+h,x:x+w]

Create a CSV file using values from a Numpy array.

import numpy
a = numpy.asarray([ [2,2,9], [1,5,6], [0,8,9],[1,6,3] ])
numpy.savetxt("test.csv", a, delimiter=",")

Turn image binary.

import cv2
import matplotlib.pyplot as pltimg = cv2.imread(‘test.jpg’, 2)ret, bw_img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)plt.imshow(bw_img, cmap='gray')

Image dilation and erosion.

import cv2
import numpy as np
import matplotlib.pyplot as pltimg = cv2.imread(‘input.png’, 0)kernel = np.ones((5,5), np.uint8)
img_erosion = cv2.erode(img, kernel, iterations=1)
img_dilation = cv2.dilate(img, kernel, iterations=1)plt.imshow(img_erosion)
plt.imshow(img_dilation)

Invert black to white and vice versa in an image.

numpy.invert(close_img)

Increase/ Decrease the brightness of an image.


im = Image.open("images.png")image = ImageEnhance.Brightness(im)level = 5.0image.enhance(level).show()

Conclusion

I hope this miscellaneous combination of code helps you in times when the GOAT StackOverflow gives vague question-answer pair suggestions. This little list has helped me whenever I needed to do a specific task as an ML beginner. I thought this might help a fellow learner; hence decided to share it.

I will keep adding more codes as I keep learning and exploring more!

Till then, Happy coding!

21 Data Science Hacks: A Cheat Sheet for Data Science Beginners

Written by Garima Nishad