21 Data Science Hacks: A Cheat Sheet for Data Science Beginners
3 min readJul 29, 2022
Few commands/codes that I use as a Data Scientist almost every day
- Crop a video from a specific start time to an end time.
ffmpeg -i <input video path> -ss 00:00:03 -t 00:00:08 -async 1 < output video path>
- Sort files in a folder in a human way, i.e. 0,1,2,3 …n.
def tryint(s):
try:
return int(s)
except:
return sdef alphanum_key(s):
“”” Turn a string into a list of string and number chunks.
“z23a” -> [“z”, 23, “a”]
“””
return [ tryint(c) for c in re.split(‘([0–9]+)’, s) ]def sort_nicely(l):
“”” Sort the given list in the way that humans expect.
“””
l.sort(key=alphanum_key)
sort_nicely(filename)
- Find the number of files in a folder from the command line.
ls -1 | wc -l
- Show image in a jupyter notebook using matplotlib.
import cv2
import matplotlib.pyplot as plt
im = cv2.imread('image/im.png')
plt.imshow(im)
- Rotate a video.
from moviepy.editor import *
# loading video gfg
clip = VideoFileClip("/input.mp4")
# rotating clip by 180 degree
clip = clip.rotate(270)
# saving the clip
clip.write_videofile("/output.mp4")
- Compare two dataframe columns.
import pandas as pd
df = pd.read_csv('/home/garima/Desktop/demo - Sheet1.csv')
count = 0
for i in range(len(df)):
if df['Actual'][i] == df['Predictions'][i]:
count +=1
else:
print(df['FileName'][i])
print(count)
- Get the list of all files in the subfolder.
import os
all_files = os.listdir('/images/')
directory = "/images/"
arr = []
direc = []
for root, subdirectories, files in os.walk(directory):
for subdirectory in subdirectories:
direc.append(os.path.join(root, subdirectory))
for file in files:
arr.append(os.path.join(root, file))
print(len(arr), len(direc))
- Rotate all files in a folder.
from scipy import ndimage, misc
import numpy as np
import os
import cv2
import matplotlib.pyplot as plt
import imageio
outPath = "./frame"
path = "./frame"
# iterate through the names of contents of the folder
for image_path in os.listdir(path):
# create the full input path and read the file
input_path = os.path.join(path, image_path)
image_to_rotate = imageio.imread(input_path)
# rotate the image
rotated = ndimage.rotate(image_to_rotate, 270)
# create full output path, 'example.jpg'
# becomes 'rotate_example.jpg', save the file to disk
fullpath = os.path.join(outPath, image_path)
imageio.imsave(fullpath, rotated)
- Check if a string represents an int.
def isint(s):
try:
int(s)
return True
except ValueError:
return False
isint("99")
- Merge multiple pdf files.
from PyPDF2 import PdfMerger
pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf', 'file4.pdf']
merger = PdfMerger()
for pdf in pdfs:
merger.append(pdf)
merger.write("result.pdf")
merger.close()
- Extracting and Saving Video Frames.
import cv2
vidcap = cv2.VideoCapture('input.mp4')
success,image = vidcap.read()
count = 0
while success:
cv2.imwrite("frame%d.jpg" % count, image)
success,image = vidcap.read()
print('Read a new frame: ', success)
count += 1
- Create a new Conda environment. For all conda-related commands -> this has been my holy grail!
conda create — name py35 python=3.5
- Remove a Conda environment.
conda env remove -n ENV_NAME
- Stop a jupyter notebook port.
jupyter notebook stop 8888
- Read an image from a URL.
from PIL import Image
import urllib.request
URL = 'url.jpg'
with urllib.request.urlopen(URL) as url:
with open('temp.jpg', 'wb') as f:
f.write(url.read())
img = Image.open('temp.jpg')
img.show()
- Crop an image if x y w h is given.
crop = im[y:y+h,x:x+w]
- Create a CSV file using values from a Numpy array.
import numpy
a = numpy.asarray([ [2,2,9], [1,5,6], [0,8,9],[1,6,3] ])
numpy.savetxt("test.csv", a, delimiter=",")
- Turn image binary.
import cv2
import matplotlib.pyplot as pltimg = cv2.imread(‘test.jpg’, 2)ret, bw_img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)plt.imshow(bw_img, cmap='gray')
- Image dilation and erosion.
import cv2
import numpy as np
import matplotlib.pyplot as pltimg = cv2.imread(‘input.png’, 0)kernel = np.ones((5,5), np.uint8)
img_erosion = cv2.erode(img, kernel, iterations=1)
img_dilation = cv2.dilate(img, kernel, iterations=1)plt.imshow(img_erosion)
plt.imshow(img_dilation)
- Invert black to white and vice versa in an image.
numpy.invert(close_img)
- Increase/ Decrease the brightness of an image.
im = Image.open("images.png")image = ImageEnhance.Brightness(im)level = 5.0image.enhance(level).show()
Conclusion
I hope this miscellaneous combination of code helps you in times when the GOAT StackOverflow gives vague question-answer pair suggestions. This little list has helped me whenever I needed to do a specific task as an ML beginner. I thought this might help a fellow learner; hence decided to share it.
I will keep adding more codes as I keep learning and exploring more!
Till then, Happy coding!