# Generating Shakespeare

You will create a small RNN network to learn how to write Shakespeare text letter by letter. Unfortunately these types of model take a very long time to train (hours) on a decent GPU so your results today in class won't be optimal. They may still impress you.

First load the dataset from the intenet

In [None]:
import requests

# Download the file
url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
response = requests.get(url)
text = response.text

# Print some info
print("Downloaded Shakespeare text. Length:", len(text), "characters")
print(text[:100])


You need to transform this into an array of integers instead of characters. Use the sklearn LabelEncoder. You should find 64 distinct characters. To be sure, print out all the encoded integers and the character they correspond to. *If you want* you can lowercase all the letters first. This may speed up training some.

In [None]:
# your code

Now as you did last class, convert this single array into X,y pairs, where each row of X is a string of characters and each y is the next character. For example

'to be or not to b', 'e'
'what light throug', 'h'

You can choose how long you want the string of X chars to be (64,128,256 -- something in this range is reasonable. Smaller is faster to train. Longer makes a smarter model)

In [None]:
# your code

Create a train/test set by choosing the first say 80% of the data for training. 

In [None]:
# your code

Input to an RNN needs to be a 3D tensor. You will probably need to reshape your data.

```python
# Reshape the input data for LSTM (samples, timesteps, features)
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)
```

For example if X_train.shape is (1000,100,1) then you have 1000 phrases each of length 100. The '1' wraps this in a 3D tensor. 

In [None]:
# your code

Define your RNN. Use one layer of RNN -- you can choose SimpleRNN, LSTM, or GRU with similar semantics. Here is an outline

```python
# Define the LSTM model
model = Sequential()
model.add(Input([None,1])
model.add(GRU(128)) # 128 hidden units in one GRU layer
model.add(Dense(alphabet_size, activation='softmax'))
```

The input is a sequence of *any length* (hence the `None`), but only 1D (characters). The output is a 1-hot encoded vectors over each character. Train this using cross entropy and adam optimizer. You can pick any batch size (larger is faster, consult the GPU memory usage). Don't expect super high accuracy, train only for a few epochs (10 or less, maybe much less! Start with 1)

In [None]:
# Your code

## Testing the model

This is a bit trickier than what we've done before. You need to process an input phrase, convert it to an array of ints, feed it to the model, get the logits of output, define a probability distribution,
select an element according to that distribution, append the result to the input, and then do this over in a loop until you have generated as much output as you want. We can break this down into pieces

First write `next_char(text, temp)` that gets the single next character predicted using `text` as input. Remember to employ the temperature. Here's a snippet that may help

```python
  probs = # output from your model
  logits = np.log(probs)/temp # we have to invert the softmax to get back to logits, then divide by temp
  char_id = tf.random.categorical(logits, num_samples=1) # helper function to apply softmax and then randomly sample
```

In [None]:
# your code

Now write `extend_text(text, n_chars, temp)` to add any number of characters to `text` by calling `next_char` repeatedly

In [None]:
# your code

Finally, generate some Shakespeare! Experiment with different seeds and seed lengths and temperatures.

## Saving State

When training gets this involved you really need some good practices to save your work. Here's a callback that saves progress as you train. Especially important this is on Colab, which will stop and shutdown your session if you don't make it feel special all the time.

```python

from tensorflow.keras.callbacks import ModelCheckpoint
checkpoint_filepath = 'best_shakespeare_model.keras'

model_checkpoint_callback = ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_weights_only=False,  # Save the entire model
    monitor='val_loss',  # Monitor validation loss
    mode='min',  # Save the model when val_loss is minimized
    save_best_only=True  # Only save the best model
)

# Train the model with the callback
history = model.fit(X_train, y_train, epochs=500,  validation_split=0.1, callbacks=[model_checkpoint_callback])
```