K-Means Clustering

K-Means clustering is a clustering method where we specify how many clusters we are looking for in our data. This is represented in our data with n_clusters.

from sklearn.cluster import KMeans

# Set the values for your clusters
kmeans = KMeans(n_clusters=n, n_init='auto', random_state=state)  # n_init can be auto, or a number.
# Train the data
kmeans.fit(data)
# Get information on your clusters
y_kmeans = kmeans.predict(data)
for i in range n:  # n is the amount of clusters:
    print(f"Cluster {i+1} contains {len(data[y_kmeans == i])} points")
print(f"Variance: {kmeans.inertia_}")

Elbow Method

Since K-Means clustering needs us to specify the amount of clusters to look for, we need a method for detecting the optimal amount of clusters. We can use the elbow method for this.
We can plot our elbow graph using something like this:

def elbow_diagram(data, max_clusters=8):
    wcss=[]
    for i in range(1,max_clusters):
        kmeans = KMeans(n_clusters=i, init ='k-means++', max_iter=50, n_init=10)
        kmeans.fit(data)
        wcss.append(kmeans.inertia_)
    plt.plot(range(1,max_clusters),wcss)
    plt.title('The Elbow Method Graph')
    plt.xlabel('Number of clusters')
    plt.xticks(range(max_clusters))
    plt.ylabel('WCSS')
    plt.show()
elbow_diagram(reduced, max_clusters=10)

We could also automate this, by tracking the change in inertia. If the relative change in the inertia is less than the relative change in the number of clusters, our slope has definitely begun to flatten out.